Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spider #34

Open
AI0TSec opened this issue Jul 16, 2019 · 0 comments
Open

spider #34

AI0TSec opened this issue Jul 16, 2019 · 0 comments

Comments

@AI0TSec
Copy link
Owner

AI0TSec commented Jul 16, 2019

Requests库

京东

import requests
url = "https://item.jd.com/100005603522.html"
try:
	r = requests.get(url)
	r.raise_for_status()
	r.encoding = r.apparent_encoding
	print(r.text[:1000])
except:
	print("爬取失败")

亚马逊

import requests
url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
try:
	kv = {'user-agent':'Mozilla/5.0'}
	r = requests.get(url,headers=kv)
	r.raise_for_status()
	r.encoding = r.apparent_encoding
	print(r.text[1000:2000])
except:
	print("爬取失败")

百度

import requests
url = "http://www.baidu.com/s"
keyword = "Python"
try:
	kv = {'wd':keyword}
	r = requests.get(url,params=kv)
	print(r.request.url)
	r.raise_for_status()
	print(len(r.text))
except:
	print("爬取失败")

360

import requests
url = "https://www.so.com/s"
keyword = "python"
try:
	kv = {'q':keyword}
	r = requests.get(url,params=kv)
	print(r.request.url)
	r.raise_for_status()
	print(len(r.text))
except:
	print("爬取失败")

图片爬取

import requests
import os
url = "http://image.ngchina.com.cn/userpic/99679/2019/0615113404996793383.jpeg"
root = "D://pics//"
path = root + url.split('/')[-1]
try:
	if not os.path.exists(root):
		os.mkdir(root)
	if not os.path.exists(path):
		r = requests.get(url)
		with open(path,'wb') as f :
			f.write(r.content)
			f.close()
			print("文件保存成功")
	else:
		print("文件已存在")
except:
	print("爬取失败")

IP地址归属地

import requests
url = "http://m.ip138.com/ip.asp?ip="
try:
	r = requests.get(url + "202.204.80.112")
	r.raise_for_status()
	r.encoding = r.apparent_encoding
	print(r.text[-500:])
except:
	print("爬取失败")

参考

https://www.cnblogs.com/zhaof/tag/%E7%88%AC%E8%99%AB/

@AI0TSec AI0TSec changed the title web crawler spider Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant