本人正在学习 Python 爬虫，想模拟登录本网站（ www.v2ex.com），但怎么都不成功，代码如下。麻烦各位大神帮忙看看，谢谢了。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
from bs4 import BeautifulSoup

login_url=r'https://www.v2ex.com/signin'
headers = { 
	"content-type":"application/x-www-form-urlencoded",
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
    'Origin': 'https://www.v2ex.com',
    'Referer': 'https://www.v2ex.com/signin'    
}  
userName='pive'
password='******'
s=requests.Session()
res=s.get(login_url,headers=headers)
soup=BeautifulSoup(res.content,"html.parser")
once=soup.find("input",{"name":"once"})["value"]
formUserName=soup.find("input",type="text")["name"]
formPassword=soup.find("input",type="password")["name"]
print(once+"\n"+userName+"\n"+password)
post_data={
	formUserName:userName,
	formPassword:password,
	"once":once,
	"next":"/settings"
}
s.post(login_url,post_data,headers=headers)
f = s.get('https://www.v2ex.com/settings',headers=headers)
with open('v2ex.html',"wb") as v2ex:
	v2ex.write(f.content)

第 1 条附言 · 2017-02-20 13:03:53 +08:00

多谢各位，已解决问题。原因是我用户名的表单取错，改成下面这样就好了。

soup=BeautifulSoup(res.content,"html.parser")
form=soup.find("form",action="/signin")
once=form.find("input",{"name":"once"})["value"]
formUserName=form.find("input",type="text")["name"]
formPassword=soup.find("input",type="password")["name"]

Headers

once

password

Python

15 条回复 • 2017-05-25 14:13:46 +08:00

5dkgansm

2017-02-20 10:48:45 +08:00

post 之后呢？ cookie 呢？

liuxu

2017-02-20 10:55:53 +08:00

我也刚学，要不用 pprint 输出一下 post_data 和 headers ，
或者 post_data 的 data 要写上？
s.post(login_url,data=post_data,headers=headers)

我用的 lxml 的 etree 解析的，其他的跟你差不多，现在每天自动获取铜币

pive

2017-02-20 11:01:46 +08:00

@5dkgansm post 只登录啊，登录之后打开一个需要登录的连接（ https://www.v2ex.com/settings ）正常的话应该会进到这个页面的，现在是跳回去登录页面了（提示：你要查看的页面需要先登录）。 request.Session()不是自动处理 cookie 的吗？

lwjcjmx123

2017-02-20 11:10:16 +08:00 via Android

@liuxu 看到自动获取铜币这个我心动了，有没有写成可执行脚本什么的啊

siloong

2017-02-20 11:16:47 +08:00

```python3
#-*- coding=utf-8 -*-
import requests
import re
from lxml import etree

signin='https://v2ex.com/signin'
home='https://v2ex.com'
url='https://v2ex.com/mission/daily'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
'Origin': 'https://www.v2ex.com',
'Referer': 'https://www.v2ex.com/signin',
'Host': 'www.v2ex.com',
}
data={}

def sign(username,passwd):
session=requests.Session()
session.headers=headers
loginhtm=session.get(signin,verify=False).content
page=etree.HTML(loginhtm)
x=page.xpath("//input[@class='sl']/@name")
usernameform=x[0]
passwdform=x[1]
onceform=page.xpath("//input[@name='once']/@value")[0]
data[usernameform]=username
data[passwdform]=passwd
data['once']=onceform
data['next']='/'
loginp=session.post(signin,data=data,verify=False)
sign=session.get(url).content.decode('UTF-8')
qiandao=re.findall("location.href = '(.*?)'",sign)[0]
if (qiandao == '/balance'):
print ("已经签过了")
else:
session.get(home+qiandao,verify=False)
print ('签到成功')

if __name__=='__main__':
username='siloong'
passwd='123456'
requests.packages.urllib3.disable_warnings()
sign(username,passwd)
```
--代码来源于互联网

liuxu

2017-02-20 11:17:51 +08:00

@lwjcjmx123 发这个违规不。。见我 github/shell/v2ex 文件， selenium_v2ex 是用 selenium 完成的，树莓派是 arm 不支持 phantomjs ，所以又用 request 完成了，你记得替换 username 和 password

xvx

2017-02-20 11:26:50 +08:00 via iPhone

password='******'
提示：密码错误。

哈哈哈哈。

mytsing520

2017-02-20 11:29:39 +08:00

@Livid

MyFaith

2017-02-20 11:31:03 +08:00

class v2ex():
def __init__(self):
self.s = requests.Session()
self.headers = {
'Host':'www.v2ex.com',
'Origin':'http://www.v2ex.com',
'Referer':'http://www.v2ex.com/signin',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'
}

def get_requires_data(self):
html = self.s.get('http://www.v2ex.com/signin', headers=self.headers).text
once = re.search(r'value="(\d{5})"', html).group(1)
username = re.search(r'type="text" class="sl" name="(.*?)"', html).group(1)
password = re.search(r'type="password" class="sl" name="(.*?)"', html).group(1)
return {
'once': once,
'username': username,
'password': password
}

def login(self, username, password):
data = self.get_requires_data()
form_data = {
data['username']: username,
data['password']: password,
'once': data['once'],
'next': '/'
}
res = self.s.post('http://www.v2ex.com/signin', data=form_data, headers=self.headers)
return res

def sign(self):
html = self.s.get('http://www.v2ex.com/mission/daily', headers=self.headers).text
once_code = re.search(r'signout\?once=(\d{5})', html).group(1)
url = 'http://www.v2ex.com/mission/daily/redeem?once=' + once_code
res = self.s.get(url, headers=self.headers)
result = res.text
days = re.search(r'已连续登录 (\d+) 天', result).group(1)
if not result.find('今天的登录奖励已经领取过了哦'):
# 如果找不到，则说明签到成功。
print('[v2ex]签到成功，已经连续签到%s 天'%days)
else:
# 如果找得到，则代表已经签到过了
print('[v2ex]签到失败，您已经签到过了，已经连续签到%s 天'%days)