I just witnessed logs filling up the whole hard drive, and wondering if there is some better way I could deal with the log size.
Last night, just before I go to sleep, my router Pi suddenly started to work SLOWLY. I mean, really slowly. WiFi speed dropped to 100 kilobyte per second.
Since I wanted to go to bed. I just left it there.
And in the next morning, when I'm up to deal the problem, I found it become unable to answer DHCP request, so unable to access through network.
As it's headless, I couldn't see what is happening. So I pulled out the power cable.
Then it didn't boot up normally, I can hear the hard driver is working constantly, so I guessed it's performing a fsck. I left it there and wait. After about 20 minutes, the hard driver is stopped. But it didn't brought up the network interface. After think a while, I plugged out the power cable again. Maybe I can plug in a keyboard at this point and press Enter to see what would happen.
It boots up, and I got SSH into it. I headed straight toward logs. However, Vim couldn't open the /var/log/syslog, it froze up. The system was responding, as I could use ^a + c to start a new Screen window. I saw Vim is busy with something, using up a whole CPU core. I killed Vim, then did a "ls -lh" on logs. The "syslog", "messages" and "kern.log" sized 109G. And "df -h" told me the root file system has no space left.
I erased these logs.
They were filled up with some error lines. I don't quite care these lines for now.
Code: Select all
3473298 Feb 3 06:25:18 pi1 kernel: [23095.286352] ------------[ cut here ]------------
3473299 Feb 3 06:25:18 pi1 kernel: [23095.286371] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x348/0x3 70 [sch_hfsc]()
3473300 Feb 3 06:25:18 pi1 kernel: [23095.286378] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_TCPMSS xt_tcpud p xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_fil ter ip_tables cfg80211
3473301 Feb 3 06:25:18 pi1 kernel: [23095.286433] x_tables rfkill pppoe pppox ppp_generic slhc cls_u32 sch_hfsc sch_fq_codel bridge stp llc rtc_ds1307 sg 8192eu(O) asix libphy bcm2835_rng bcm2835_gpiomem i2c_bcm2708 uio_pdrv_genirq uio i2c_de v snd_bcm2835 snd_pcm snd_timer snd fuse ipv6
3473302 Feb 3 06:25:18 pi1 kernel: [23095.286577] CPU: 1 PID: 11 Comm: ksoftirqd/1 Tainted: G W O 4.1.16-v7+ #833
3473303 Feb 3 06:25:18 pi1 kernel: [23095.286596] Hardware name: BCM2709
3473304 Feb 3 06:25:18 pi1 kernel: [23095.286630] [<800180c0>] (unwind_backtrace) from [<80013b88>] (show_stack+0x20/0x24)
3473305 Feb 3 06:25:18 pi1 kernel: [23095.286660] [<80013b88>] (show_stack) from [<80554e10>] (dump_stack+0x80/0x98)
3473306 Feb 3 06:25:18 pi1 kernel: [23095.286691] [<80554e10>] (dump_stack) from [<80026970>] (warn_slowpath_common+0x8c/0xc8 )
3473307 Feb 3 06:25:18 pi1 kernel: [23095.286716] [<80026970>] (warn_slowpath_common) from [<80026a68>] (warn_slowpath_null+0 x2c/0x34)
3473308 Feb 3 06:25:18 pi1 kernel: [23095.286747] [<80026a68>] (warn_slowpath_null) from [<7f220658>] (hfsc_dequeue+0x348/0x3 70 [sch_hfsc])
3473309 Feb 3 06:25:18 pi1 kernel: [23095.286780] [<7f220658>] (hfsc_dequeue [sch_hfsc]) from [<8049d34c>] (__qdisc_run+0x40/ 0x1a0)
3473310 Feb 3 06:25:18 pi1 kernel: [23095.286809] [<8049d34c>] (__qdisc_run) from [<80479258>] (net_tx_action+0x1d0/0x274)
3473311 Feb 3 06:25:18 pi1 kernel: [23095.286835] [<80479258>] (net_tx_action) from [<8002a2c4>] (__do_softirq+0x124/0x334)
3473312 Feb 3 06:25:18 pi1 kernel: [23095.286860] [<8002a2c4>] (__do_softirq) from [<8002a514>] (run_ksoftirqd+0x40/0x6c)
3473313 Feb 3 06:25:18 pi1 kernel: [23095.286890] [<8002a514>] (run_ksoftirqd) from [<80045fec>] (smpboot_thread_fn+0x124/0x1 98)
3473314 Feb 3 06:25:18 pi1 kernel: [23095.286918] [<80045fec>] (smpboot_thread_fn) from [<800426d0>] (kthread+0xe8/0x104)
3473315 Feb 3 06:25:18 pi1 kernel: [23095.286944] [<800426d0>] (kthread) from [<8000f858>] (ret_from_fork+0x14/0x3c)
3473316 Feb 3 06:25:18 pi1 kernel: [23095.286963] ---[ end trace d0e5e44167ca3b56 ]---
I don't quite get the log system's design, I thought it's a FIFO queue that new messages would overwrite the old one. It looks like not, it would just eat up whole hard disk and die.
I thought logs are for rescuing the system, so if something happened, I could check the logs. So the system would keep running all the time. However now I know the logs would stuff the system to die before other problem. I could only check the corpse.
So the joke is it would be better to be stuffed by logs than let the actual problem trigger the nukes.
I'm thinking about mounting the /var/log to some other partition to limit it.
Do you have any better suggestion on this?