Requests 库

快速上手 — Requests 文档

1.1

requests库 的7个主要方法

Response对象属性

Requests库的异常

raise_for_status()

爬取网页通用代码框架

爬取网页通用代码框架输出结果比较

1.2 requests.get(url,params = None,**kwargs)

params
data
json
headers

2 爬取源码

2.1 爬取京东页面 – 一般方法

1
2
3
4
5
6
7
8
9
import requests
url = "https://item.jd.com/2967929.html"
try:
r = requests.get(url)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text)
except:
print(爬取失败)

2.2 爬取亚马逊 – 伪装成浏览器

1
2
3
4
5
6
7
8
9
10
import requests
url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
try:
kv = {'user-agent':'Mozilla/5.0'}
r = requests.get(url,headers = kv)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text)
except:
print('爬取失败')

2.3 360&百度关键词

搜索引擎关键词提交接口

1
2
3
4
5
6
7
8
9
10
import requests
keyword = 'Python'
try:
kv = {'wd':keyword}
r = requests.get("http://www.baidu.com/s",params = kv)
print(r.request.url)
r.raise_for_status()
print(len(r.text))
except:
print("爬取失败")