Regular model B, rev. 2. This Pi is going to be my home thermostat for HVAC, so I need near perfect reliability.
I learned about the onboard watchdog timer, and implemented it yesterday. Today, the Pi locked up, and the watchdog did not reset it.
I am now thinking a hardware watchdog connected to P6 is the way to go. A quick search shows up the SwitchDoc Labs dual timer for $15.95. Does anybody have any experience with this device? Are there any others that people recommend?
JimR
Re: Hardware watchdog?
I have no experience with the SwitchDoc, but it looks like you will also need a relay (or preferable a solid state device) for full, cold boot operation (i.e. to remove the power from the Pi and properly reboot).JimKD1YV wrote:...so I need near perfect reliability...Does anybody have any experience with this device? Are there any others that people recommend?
You could make your own if you have a little experience.
You could use a cheap PIC as the watchdog as shown in this simple block diagram.
When power is applied to the system, a program runs in the PIC and turns on the power to the Pi. Power is only maintained if the Pi sends a pulse to the watchdog within (say) 2 minutes. If the PIC sees the pulse, its timer is reset and power is maintained for another 2 minutes.
If the PIC doesn't get a pulse on time, it shuts down the power, waits for (say 30 seconds) and then powers the Pi once again.
I'm using a cheap (£2) Picaxe, an IRFD9024 mosfet, and a 78B5 regulator to do something similar (see circuit here: http://captainbodgit.blogspot.co.uk/201 ... tails.html). Although my circuit is not a watchdog, the component arrangement is broadly similar, and it gives an idea of how few components would be needed (...once you have removed the bird box specific bits!).
Reliability needs to be seen in context: you want your heating system to be reliable for years to come, but nobody dies if it falls over. That said, the Raspberry Pi hardware seems remarkably reliable for such a cheap device, and its probably more reliable than the typical black module used to power it!
Why does your Pi lock-up? Anything to do with electrical noise when switching HVAC?
The other big question mark is the software...
Re: Hardware watchdog?
The built-in watchdog is a hardware device, so if it failed it's likely to be a configuration problem: either the hardware was not initialised on boot, or the Pi did not crash badly enough to prevent the software daemon from continuing to reset the watchdog's hardware timer.
Have you tested your watchdog setup? If not, there's a package called "stress" which is useful.
You can also simulate a complete lock-up by killing the software daemon:
Have you tested your watchdog setup? If not, there's a package called "stress" which is useful.
Code: Select all
sudo apt-get install stress
stress -c 50 # load CPU
stress -m 500 --vm-bytes 1M # load RAM
Code: Select all
sudo kill -STOP $(pidof watchdog)
Re: Hardware watchdog?
I have experience and could do a lot. I'm trying not to invest the time and effort if this wheel has already been invented. I also do not own a PIC programmer, nor wish to learn all of the nuances of another PIC family.SteveDee wrote: You could make your own if you have a little experience.
You could use a cheap PIC as the watchdog as shown in this simple block diagram.
If the heating system fails and the pipes freeze and burst, there can easily be tens of thousands of dollars worth of damage. Not life-threatening, but not a risk I want to take.Reliability needs to be seen in context: you want your heating system to be reliable for years to come, but nobody dies if it falls over.
Not the HVAC. It's not even connected at the moment, it's on the workbench running in simulated mode with only the heat sensor and SSD relays attached. I'm running a 1500 mA power cube, so it should be more than sufficient.Why does your Pi lock-up? Anything to do with electrical noise when switching HVAC?
Part of the problem with the Pi is that it really does not give any clues about its failure. I've pored over all of the logs, and can't find any indication. I've implemented a 10 minute heartbeat logger into cron just so I can find out the (nearly) true time of failure.
Always a question. I am running a python package written by another experimenter. I am tweaking it and trying to understand it as I go.The other big question mark is the software...
JimR
Re: Hardware watchdog?
So you've carried out a risk assessment, and you have a $35 RaspberryPi pc, set against potentially $10,000's damage if the system fails.JimKD1YV wrote:...If the heating system fails and the pipes freeze and burst, there can easily be tens of thousands of dollars worth of damage. Not life-threatening, but not a risk I want to take.
Clearly the Pi pc (running a desktop operating system) is a poor choice for this application, compared to (say) a micro-controller (a device dedicated to this single set of tasks).
Some heating systems (like ours) have a 'frost alarm'. It relies upon as few components as possible to fire up the boiler in the event of the temperature dropping a few degrees below freezing point in certain areas. The key here is that the less complex the arrangement, the more reliable it is going to be. Its not the cost of the Pi that is the problem, it is that it has too much complexity for this critical application.
I'd suggest you run your Pi without your controller software running for (say) at least 48hrs. If it doesn't crash, its probably your code.SteveDee wrote:Why does your Pi lock-up?
The Pi will give clues, but only for detectable conditions. However, if it hangs and the Task Manager graph (in the status bar) shows 100%cpu then there is probably a process keeping it busy.Part of the problem with the Pi is that it really does not give any clues about its failure....
If it looks like your Python code is hanging, you need to add monitor points (timestamps in a text file is one approach) within the code, rather than using cron outside the code.
Good luck, & stay warm!
Re: Hardware watchdog?
I have had the Pi B lockup even with the watchdog in place. In at least one case it was due to a broken cifs mount. This was the #1 cause of stability issues I had on the Pi B, if the wifi connection fluctuated enough the cifs mount would become broken, and the age old issue of cifs mounts hanging would occur. Since my Pi uses that mount to store its recorded videos, this was a bit of an issue.TimG wrote:The built-in watchdog is a hardware device, so if it failed it's likely to be a configuration problem: either the hardware was not initialised on boot, or the Pi did not crash badly enough to prevent the software daemon from continuing to reset the watchdog's hardware timer.
When I got my Pi2 it even did it once, so then I switched my shares to NFS and haven't had an issue. When the cifs breakdown would occur the Pi would run up as high as a load of 4 or 5. The problem is if you set the watchdog response to respond to that low of a load you may get a lot of false positives if you are doing anything video/camera intensive.
Re: Hardware watchdog?
JimR,
I also have been using a Pi for my thermostat. Yes, there are issues with using the Pi with a Linux OS, but with careful design you can make it work in an "embedded" environment.
A couple of observations and words of "collected" wisdom when I built my system, which has been running for almost two years now, during which I continued to learn and enhance the system. A fantastic learning experience for me.
I'm probably preaching to the choir, so I apologize beforehand.
The Pi watchdog you mentioned is indeed hardware, and I only use that for the case that Debian crashes. Realistically, this will not happen.
What is more likely is that your Python program crashes or gets into a loop or, when you use the web, it may wait forever if there is an internet issue.
I use a software watchdog to catch a program crash or hung program. My application writes the time of day into a json file, and a cron based program checks to see if too much time has past in-between, if so it checks what is going on and can restart the application, or reset the Pi.
To avoid all that, you (obviously - sorry) need to design-in reliability and fault tolerance, which is easy to say, not so easy to do. Also, make elaborate use of the Try-Except feature, to surround all functions and all critical I/O and code pieces. Make sure you have good Except clauses that will allow you to continue running.
I also avoided many pitfalls by putting all the critical elements that control the HVAC into a seperate thread from all the other supporting code. This will make sure that your system will continue to run, even if other parts are not fully functioning.
Use the logging feature to the max. It will give you a good idea where the problems are. If you also use logrotate, you can look back at days worth of data.
Report all error conditions to the message log. It will alow you to see your errors among what the system reports.
To avoid too much wear and tear of the SD card due to the extensive logging, I suggest you use a RAM disk.
Lastly, you need to have a reliable power supply that will weather brown outs and a loss of the mains. It must also restart your Pi reliably under all circumstances.
Make sure your system can function through a Shutdown and Reboot sequence. Preserve all settings and critical variables.
I have posted several bits of my system in this Forum, with excellent inputs from true experts, search for paulv.
At this moment I'm working on a detailed post that will be a how-to for a Pi supply design for embedded applications, and that uses some of the principles of an earlier post.
Stay tuned, and indeed, stay warm!
I also have been using a Pi for my thermostat. Yes, there are issues with using the Pi with a Linux OS, but with careful design you can make it work in an "embedded" environment.
A couple of observations and words of "collected" wisdom when I built my system, which has been running for almost two years now, during which I continued to learn and enhance the system. A fantastic learning experience for me.
I'm probably preaching to the choir, so I apologize beforehand.
The Pi watchdog you mentioned is indeed hardware, and I only use that for the case that Debian crashes. Realistically, this will not happen.
What is more likely is that your Python program crashes or gets into a loop or, when you use the web, it may wait forever if there is an internet issue.
I use a software watchdog to catch a program crash or hung program. My application writes the time of day into a json file, and a cron based program checks to see if too much time has past in-between, if so it checks what is going on and can restart the application, or reset the Pi.
To avoid all that, you (obviously - sorry) need to design-in reliability and fault tolerance, which is easy to say, not so easy to do. Also, make elaborate use of the Try-Except feature, to surround all functions and all critical I/O and code pieces. Make sure you have good Except clauses that will allow you to continue running.
I also avoided many pitfalls by putting all the critical elements that control the HVAC into a seperate thread from all the other supporting code. This will make sure that your system will continue to run, even if other parts are not fully functioning.
Use the logging feature to the max. It will give you a good idea where the problems are. If you also use logrotate, you can look back at days worth of data.
Report all error conditions to the message log. It will alow you to see your errors among what the system reports.
To avoid too much wear and tear of the SD card due to the extensive logging, I suggest you use a RAM disk.
Lastly, you need to have a reliable power supply that will weather brown outs and a loss of the mains. It must also restart your Pi reliably under all circumstances.
Make sure your system can function through a Shutdown and Reboot sequence. Preserve all settings and critical variables.
I have posted several bits of my system in this Forum, with excellent inputs from true experts, search for paulv.
At this moment I'm working on a detailed post that will be a how-to for a Pi supply design for embedded applications, and that uses some of the principles of an earlier post.
Stay tuned, and indeed, stay warm!
Re: Hardware watchdog?
A couple things I'd like to point out.JimKD1YV wrote: If the heating system fails and the pipes freeze and burst, there can easily be tens of thousands of dollars worth of damage. Not life-threatening, but not a risk I want to take.
The Pi 2 is a great device but as seen with the Flash issue, it may be slightly imperfect. There may be other bugs yet to be found, though I am not counting on it. As such, putting this $35 computer in charge of your homes HVAC may be a bit premature. At the very least, if the issue you mentioned did occur, and the Pi thermostat did not run the heat in time to prevent your pipes from freezing, causing significant damages, its possible your homeowners insurance would deny your claim due to the fact you're using an untested [by the manufacturer for your application] device.
The nice thing about the way thermostats work though, is they are simplistic in the activation of the heating system. The easiest and best fix, imho, would be to get an old mercury style thermostat and wire it in parallel to the Pi. Set it lower than the Pi would ever start, say if the lowest temp the Pi would allow is 18.3deg Celsius/65deg Fahrenheit, set the mechanical thermostat to 15.5deg/60deg. In this way you would provide a mechanical failover in the event the Pi failed to do its job. If I were in an area where pipes froze on a regular basis, this is how I would handle it.