Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[已解决] PPPOE拦截法 导致路由器负载过高而崩溃的问题 #195

Closed
xinxin8816 opened this issue Nov 25, 2017 · 19 comments
Closed

Comments

@xinxin8816
Copy link

xinxin8816 commented Nov 25, 2017

2017-12-7更新:
我的问题是WAN接口协议配置为PPPOE产生大量无效拨号,日志文件过多导致路由器崩溃。
此问题的通用解决方法见下,或参考huipengly,gmfghp等人的回复。

这个问题仅在没有心跳的地区且路由器长期不重启(包括不限于晚上不断电、断电但有UPS或魔改充电宝给供电)时发生,不满足这个条件的同学可以无视了。
问题在于原有程序,每隔十秒运行一遍PPPOE,导致PPPOE日志文件越来越大(每分钟生成5-6KB,大概不到一个星期就是近百MB的日志文件),如果路由器性能够强,则无压力。但是如果路由器内存较少,kswapd0会将部分进程的页从物理内存交换到SWAP上,这个过程CPU基本会跑满。
开始想到的解决方法是,PING百度服务器,PING不通(即DNS无法解析)时再运行PPPOE,但是试过后发现如果没有接口连接到广域网PING指令根本不能用。
再想到的解决办法,就是转存用户名和密码所在行。日志虽然多,但NK4需要的有价值的信息无外乎就几行,所以:
1、在while do后面加一句提取并转存
echo "$(grep 'user=' /tmp/pppoe.log | grep 'rcvd' | tail -n 1)" >> /tmp/user.log
2、把grep原有路径/tmp/pppoe.log改成/tmp/user.log
3、sleep命令前加一句清除
cat /dev/null > /tmp/pppoe.log
没学过Shell命令,现学现卖,有建议或意见欢迎大佬指出。对了,换行也会被记录到user.log,可以写个if避免这种情况,但反正换行又不占地就算啦

@huipengly
Copy link
Contributor

感谢贡献。写的时候没这个需求,就没考虑到。如果你测试没问题,可以fork本项目,自己修改好,然后pull requests到本项目,让脚本适应性更好。

@Warm-rain
Copy link

k2未断电运行一个月了我竟然没发现,不过的确比以前卡点了,但是没崩溃

@huipengly
Copy link
Contributor

问题在于原有程序,每隔十秒运行一遍PPPOE,导致PPPOE日志文件越来越大

pppoe.log不是脚本运行产生的,脚本只是去读取PPPoE服务器程序生成的文件,并不会写pppoe.log。上面的这个结论是错的。

我之前测试的是路由器启动后,如果pppoe服务器接收到拨号,或者路由器端进行pppoe拨号,就会有日志记录。如果路由器一直没断网,应该不会特别大。你这个日志文件特别大,我暂时也想不到原因。 @Warm-rain 你可以看一下你的日志文件么,看看是不是这么大?

@huipengly
Copy link
Contributor

pppoe.log是日志,肯定会变。现在只能希望楼主能给个几十M的日志看看到底什么情况了。

@Warm-rain
Copy link

@huipengly 感觉不存在这问题,、他的路由器奔溃应该不是另有原因,我刷的是荒野无灯k2,我刚刚看了下日志文件,只保存前十多分钟的样子,在前面的日志已经自动清除,然后我有观察了下路由器主页的内存可用数和空闲数

可用数
17260 kB / 61304 kB (28%)
空闲数
14336 kB / 61304 kB (23%)
已缓冲
2924 kB / 61304 kB (4%)

没高过30%,会一直波动,不知这个是否准确,可做参考。

@Warm-rain
Copy link

我的拨号成功后的pppoe.log后面没变过啊.
Plugin /etc/ppp/plugins/rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
[pppoe hook debug] syscall operations name probe pandorabox ok!
[pppoe hook debug] hook status=1
Connected to 38:2c:4a:1e:e4:92 via interface br-lan
using channel 1
Using interface ppp0
Connect: ppp0 <--> br-lan
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x1a700973>]
rcvd [LCP ConfReq id=0x0 <mru 1480> <magic 0x5a473b87> ]
sent [LCP ConfRej id=0x0 ]
rcvd [LCP ConfReq id=0x1 <mru 1480> <magic 0x5a473b87>]
sent [LCP ConfAck id=0x1 <mru 1480> <magic 0x5a473b87>]
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x1a700973>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x1a700973>]
sent [LCP EchoReq id=0x0 magic=0x1a700973]
rcvd [LCP Ident id=0x2 magic=0x5a473b87 "MSRASV5.20"]
rcvd [LCP Ident id=0x3 magic=0x5a473b87 "MSRAS-0-WIN-KN3D7CGLCV2"]
rcvd [LCP Ident id=0x4 magic=0x5a473b87 "\n7v>|\37777777612'E\37777777626QG\37777777612\37777777775\37777777767\37777777635\37777777656"]
rcvd [PAP AuthReq id=0x0 user=""]
[pppoe hook debug] enter upap_authhook
[pppoe hook debug] enter pbr_set_pppoe_info
[pppoe hook debug] system command assembling
[pppoe hook debug] system command:uci set network.wan.proto=pppoe
[pppoe hook debug] system command assembling
[pppoe hook debug] system command:uci set network.wan.username=
[pppoe hook debug] system command assembling
[pppoe hook debug] system command:uci set network.wan.password=
[pppoe hook debug] system command assembling
[pppoe hook debug] system command:uci commit
[pppoe hook debug] rp_set_string ret=31
[pppoe hook debug] enter pbr_set_pppoe_info set username return=31
[pppoe hook debug] rp_set_string ret=6
[pppoe hook debug] enter pbr_set_pppoe_info set password return=6
[pppoe hook debug] exit pbr_set_pppoe_info, usr=, passwd=
[pppoe hook debug] [upap_authhook] pap username=, password=, hs=0
[pppoe hook debug] exit upap_authhook
Can't open PAP password file /etc/ppp/pap-secrets: No such file or directory
sent [PAP AuthNak id=0x0 "Login incorrect"]
PAP peer authentication failed for
sent [LCP TermReq id=0x2 "Authentication failed"]
rcvd [LCP EchoRep id=0x0 magic=0x5a473b87]
rcvd [LCP TermAck id=0x2 "Authentication failed"]
Connection terminated.
Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
[pppoe hook debug] syscall operations name probe pandorabox ok!
[pppoe hook debug] hook status=1
Send PPPOE Discovery V1T1 PADI session 0x0 length 4
dst ff:ff:ff:ff:ff:ff src d8:c8:e9:ed:44:10
[service-name]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 37
dst d8:c8:e9:ed:44:10 src 28:31:52:57:17:4e
[service-name] [AC-name CQ-FL-22JU-BAS-2.MAN.ME60-X16]
Send PPPOE Discovery V1T1 PADR session 0x0 length 4
dst 28:31:52:57:17:4e src d8:c8:e9:ed:44:10
[service-name]
Recv PPPOE Discovery V1T1 PADS session 0x3aea length 4
dst d8:c8:e9:ed:44:10 src 28:31:52:57:17:4e
[service-name]
PADS: Service-Name: ''
PPP session is 15082
Connected to 28:31:52:57:17:4e via interface eth0.2
using channel 2
Using interface pppoe-netkeeper
Connect: pppoe-netkeeper <--> eth0.2
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0xc65871a>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0xc65871a>]
rcvd [LCP ConfReq id=0x2 <mru 1492> <magic 0xc8298ea6>]
sent [LCP ConfAck id=0x2 <mru 1492> <magic 0xc8298ea6>]
sent [LCP EchoReq id=0x0 magic=0xc65871a]
syncppp not active
sent [PAP AuthReq id=0x1 user="" password=""]
rcvd [LCP EchoRep id=0x0 magic=0xc8298ea6]
rcvd [PAP AuthAck id=0x1 "2001"]
Remote message: 2001
PAP authentication succeeded
peer from calling number 28:31:52:57:17:4E authorized
sent [IPCP ConfReq id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
sent [IPV6CP ConfReq id=0x1 ]
rcvd [IPCP ConfReq id=0x1 ]
sent [IPCP ConfAck id=0x1 ]
rcvd [IPCP ConfNak id=0x1 <ms-dns1 61.128.128.68> <ms-dns2 61.128.192.68>]
sent [IPCP ConfReq id=0x2 <ms-dns1 61.128.128.68> <ms-dns2 61.128.192.68>]
rcvd [LCP ProtRej id=0x3 80 57]
Protocol-Reject for 'IPv6 Control Protocol' (0x8057) received
rcvd [IPCP ConfAck id=0x2 <ms-dns1 61.128.128.68> <ms-dns2 61.128.192.68>]
local IP address
remote IP address
primary DNS address 61.128.128.68
secondary DNS address 61.128.192.68
Script /lib/netifd/ppp-up started (pid 9266)
Script /lib/netifd/ppp-up finished (pid 9266), status = 0x9

@huipengly
Copy link
Contributor

huipengly commented Nov 27, 2017

那就确定pppoe.log只记录pppoe服务器接收到拨号,或者路由器端进行pppoe拨号的日志。现在猜不到楼主的问题了。只能等楼主传日志了。

@gmfghp
Copy link
Contributor

gmfghp commented Nov 28, 2017

在 while do 下边加一句

userinfo=$(grep 'user=' /tmp/pppoe.log | grep 'rcvd' | tail -n 1)

然后在最后的 done前边再加一个 if语句 如下,这样就可以实现 每天 清理一次 pppoe.log了

#close pppoe if log fail
if [ -z "$(ifconfig | grep "netkeeper")" ]
then
	ifdown netkeeper
else
	sleep 50
fi

#clear logs everyday
if [ "$(date '+%T' | cut -b 1-4)" == "00:0" ]
then
	cat /dev/null > /tmp/pppoe.log
	sleep 10
	echo "$userinfo" >> /tmp/pppoe.log
	sleep 10
fi

done

亲测可用,有问题的话 大家再研究下。

@xinxin8816
Copy link
Author

xinxin8816 commented Nov 29, 2017

@huipengly NK4运行时,使用的grep命令在日志文件过大时也会占用大量CPU,此外,SWAP缓存kswapd0进程也会占用CPU。
我修改后运行稳定,没发现问题。PPPOE日志几秒清除一次,留不下来太长的日志内容。
不过大致内容如下,内容无限重复:

sent [LCP ConfReq id=0x1 <mru 1492> <magic 0xae9925d5>]
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0xae9925d5>]
LCP: timeout sending Config-Requests
Connection terminated.
Modem hangup
Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src c0:61:18:bd:2b:8e
 [service-name] [host-uniq  9c 1d 00 00]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 62
 dst c0:61:18:bd:2b:8e  src cc:1a:fa:e9:6c:40
 [host-uniq  9c 1d 00 00] [AC-cookie  00 00 28 16 5a 1f 24 58 00 00 20 bc 6a 5a a8 be] [AC-name JX-NC-HGT-BAS-1.MAN.m6000S] [service-name]
Send PPPOE Discovery V1T1 PADR session 0x0 length 32
 dst cc:1a:fa:e9:6c:40  src c0:61:18:bd:2b:8e
 [service-name] [host-uniq  9c 1d 00 00] [AC-cookie  00 00 28 16 5a 1f 24 58 00 00 20 bc 6a 5a a8 be]
Recv PPPOE Discovery V1T1 PADS session 0x399 length 12
 dst c0:61:18:bd:2b:8e  src cc:1a:fa:e9:6c:40
 [host-uniq  9c 1d 00 00] [service-name]
PADS: Service-Name: ''
PPP session is 921
Connected to cc:1a:fa:e9:6c:40 via interface eth0.2
using channel 10596
Using interface pppoe-wan
Connect: pppoe-wan <--> eth0.2
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0xe57824bb>]
rcvd [LCP ConfReq id=0x1 <magic 0x641c109b> <mru 1492> <auth chap MD5>]
No auth is possible
sent [LCP ConfRej id=0x1 <auth chap MD5>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0xe57824bb>]
rcvd [LCP TermReq id=0x2]
sent [LCP TermAck id=0x2]

应该是PPPOE服务器在不停尝试超时重连一个接口
可能亦有系统关系,@Warm-rain说十分钟自动清除日志的情况在我这里不存在。
我也尝试改过PPPOE的设置。一开始打算通过注释掉超时参数来减少日志量,但没什么效果。

debug
logfile /tmp/pppoe.log
noipdefault
noaccomp
nopcomp
nocrtscts
lock
maxfail 0
#PPP超时参数设置
#lcp-echo-failure 40  #发送间隔秒
#lcp-echo-interval 0  #重复次数
show-password
show-password

@huipengly
Copy link
Contributor

可能跟不同环境有关系吧。想了一下,其实pppoe.log内容不需要存什么,日志里的账号密码存到/etc/config/network里了,然后当前账号也存在username_old这个变量里了,感觉直接在sleep前加上清除就够了。cat /dev/null > /tmp/pppoe.log,就加这一句。我没条件测试了,你试试这样可以不?

@xinxin8816
Copy link
Author

xinxin8816 commented Nov 29, 2017

@huipengly 简单加一句应该是不行的,因为NK4还要重新读取最后一次拨号的用户名存到username_old变量,判断是否与目前已连接接口的用户名一致,如果加了这一句应该会发生连上后十秒断网的情况

@xinxin8816
Copy link
Author

@huipengly 受到启发后我又去找PPPOE的问题,发现可能是由于系统接口WAN的协议配置为PPPOE,服务器尝试拨号此接口,而不停超时重试。不知道 @gmfghp 是否遇到相同问题,能否去试一下。

@huipengly
Copy link
Contributor

我感觉可行,你看看我这推断对不对,当时百度着写的,有些也记不太清楚了。
下面是代码16-19行,这个应该是如果日志里有账号,就赋值给username,如果没有,username就不变,是当前拨号的账号了。
清除了日志,并没有新拨号的情况下,在进行下面判断时,不会修改username,username和username_old就是一致的,不会断网。

    if [ "$(grep 'user=' /tmp/pppoe.log | grep 'rcvd' | tail -n 1 | cut -d \" -f 5)" == "]" ]
    then
        username=$(grep 'user=' /tmp/pppoe.log | grep 'rcvd' | tail -n 1 | cut -d \" -f 2)
    fi

@xinxin8816
Copy link
Author

@huipengly 我有这个假想是因为最初发现日志过大这个问题时,就是手动删除日志文件,删除后必断网。看程序确实好像是不存在这个BUG,我明天试一下吧。

@gmfghp
Copy link
Contributor

gmfghp commented Nov 29, 2017

我已经脱坑了,没有实验环境了。。。
这个清除日志断网的问题已经解决了,定时清除日志,时间可以自己选,最好是一天清除一次。看我上边的回复,或者我主页里面那个也行,我已经改到里面了。我亲自测试的,有问题再研究。

@LittleWhite-Carry
Copy link

@xinxin8816 原版本的东西应该不存在你说的问题。毕竟pppoe日志就是用来记录连接日志的。至于你说的那个WAN口猜想,可能是正确的。最后宣传一波,自己做了一个无心跳的多号多拨版本,可以试试。

@xinxin8816 xinxin8816 changed the title PPPOE拦截法 导致路由器负载过高而崩溃的修复 [已解决] PPPOE拦截法 导致路由器负载过高而崩溃的问题 Dec 7, 2017
@xinxin8816
Copy link
Author

@LittleWhite-Carry 想了解一下你的多拨版本,最近电信解除了一号一拨的限制

@xinxin8816 xinxin8816 reopened this Apr 23, 2018
@LittleWhite-Carry
Copy link

@xinxin8816 可以上我的主页去看。有已经打包好的

@xinxin8816
Copy link
Author

@LittleWhite-Carry 好的,谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants