spider #34

AI0TSec · 2019-07-16T13:08:28Z

Requests库

京东

import requests
url = "https://item.jd.com/100005603522.html"
try:
	r = requests.get(url)
	r.raise_for_status()
	r.encoding = r.apparent_encoding
	print(r.text[:1000])
except:
	print("爬取失败")

亚马逊

import requests
url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
try:
	kv = {'user-agent':'Mozilla/5.0'}
	r = requests.get(url,headers=kv)
	r.raise_for_status()
	r.encoding = r.apparent_encoding
	print(r.text[1000:2000])
except:
	print("爬取失败")

百度

import requests
url = "http://www.baidu.com/s"
keyword = "Python"
try:
	kv = {'wd':keyword}
	r = requests.get(url,params=kv)
	print(r.request.url)
	r.raise_for_status()
	print(len(r.text))
except:
	print("爬取失败")

360

import requests
url = "https://www.so.com/s"
keyword = "python"
try:
	kv = {'q':keyword}
	r = requests.get(url,params=kv)
	print(r.request.url)
	r.raise_for_status()
	print(len(r.text))
except:
	print("爬取失败")

图片爬取

import requests
import os
url = "http://image.ngchina.com.cn/userpic/99679/2019/0615113404996793383.jpeg"
root = "D://pics//"
path = root + url.split('/')[-1]
try:
	if not os.path.exists(root):
		os.mkdir(root)
	if not os.path.exists(path):
		r = requests.get(url)
		with open(path,'wb') as f :
			f.write(r.content)
			f.close()
			print("文件保存成功")
	else:
		print("文件已存在")
except:
	print("爬取失败")

IP地址归属地

import requests
url = "http://m.ip138.com/ip.asp?ip="
try:
	r = requests.get(url + "202.204.80.112")
	r.raise_for_status()
	r.encoding = r.apparent_encoding
	print(r.text[-500:])
except:
	print("爬取失败")

参考

https://www.cnblogs.com/zhaof/tag/%E7%88%AC%E8%99%AB/

The text was updated successfully, but these errors were encountered:

AI0TSec changed the title ~~web crawler~~ spider Jul 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spider #34

spider #34

AI0TSec commented Jul 16, 2019 •

edited

Loading

spider #34

spider #34

Comments

AI0TSec commented Jul 16, 2019 • edited Loading

Requests库

参考

AI0TSec commented Jul 16, 2019 •

edited

Loading