csv+requests+BeautifulSoup爬取下厨房本周最受欢迎菜谱

发表于 2020-11-09 分类于 Python

requests 库请求网页数据，BeautifulSoup 库解析和提取数据，最后将提取出的数据保存为csv格式文件

环境

windows10
python3.8.6

安装库

使用 pipenv 安装

1 2	pipenv install bs4 pipenv install requests

使用pip安装

1 2	python3 -m pip install bs4 -i https://pypi.tuna.tsinghua.edu.cn/simple python3 -m pip install requests -i https://pypi.tuna.tsinghua.edu.cn/simple

获取数据

下厨房本周最受欢迎菜谱链接 http://www.xiachufang.com/explore/

只提取第一页的菜谱

import requests

# 请求头
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36'}
# 发起请求
res = requests.get('http://www.xiachufang.com/explore/', headers=headers)
# 打印状态码 200表正常响应
print(res.status_code)
# 定义Response对象的编码类型
res.encoding = 'utf-8'
# 将Response对象转换为字符串数据
html = res.text

解析和提取数据

from bs4 import BeautifulSoup

# 括号中的第0个参数必须是字符串类型,第1个参数是解析器
soup = BeautifulSoup(html, 'html.parser')
# 提取含有 菜名 的html源码
items = soup.find_all('p', class_='name')
# 提取含有 食材 的html源码
ingredients = soup.find_all('p', class_='ing ellipsis')

for item in range(len(items)):
    # 从html源码依次提取出 [菜名,食材,链接]
    items_all = [
                items[item].find('a').text[17:-14],
                ingredients[item].text[1:-1],
                'http://www.xiachufang.com' + items[item].find('a')['href']
                ]

保存数据

import csv

# 创建一个名为 下厨房本周最受欢迎菜谱.csv 的文件
with open(r'下厨房本周最受欢迎菜谱.csv', 'w', newline='', encoding='utf-8') as f:
    # 实例化writer对象
    writer = csv.writer(f)
    # 向csv文件写入标题
    writer.writerow(['菜名', '食材', '链接'])

    # 将BeautifulSoup提取出的数据写入csv文件
    writer.writerow(items_all)

2020/11/9 爬取数据

完整代码地址

完整代码