"import urllib.request import chardet page = urllib.request.urlopen(‘[链接]’) # 打开网页 htmlCo ...."

6941o4y40v

Rpa 10728 号会员
python基础网页处理案例分享新手学习 • 0 回帖 • 916 浏览 • 2020-07-13 09:56:27

把爬取的网页源代码保存到文档中

import urllib.request

import chardet

page = urllib.request.urlopen(‘http://www.meituba.com/tag/juesemeinv.html’) # 打开网页

htmlCode = page.read() # 获取网页源代码

#print(chardet.detect(htmlCode)) # 查看编码方式

data = htmlCode.decode(‘utf-8’)

#print(data) # 打印网页源代码

pageFile = open(‘pageCode.txt’,‘wb’)# 以写的方式打开 pageCode.txt

pageFile.write(htmlCode)# 写入

pageFile.close()# 开了记得关