Many ESP8266 enthusiasts still struggle with unwanted resets and non-recoverable system crashes. That comes from my recent review of the most often expressed issues noted on-line. After investing significant time with this module, many have simply given up and moved on with other options.
That is too bad…
And while I have written several articles on the subject, I have to admit that my long-term ESP8266 reliability test also failed months ago. But the test had not been revisited. That is. not until now. The problem was that the original tests were using an older, less reliable sketch structure.
Now, with some much-needed improvements, my basic setup now provides a reliable and stable platform for IoT projects. That is the point of this writing. The following information should prove useful for anyone wanting to use the ESP8266 on a 24/7 basis with minimal or no down-time…
Here’s a reference to my past writings on this topic:
4 ways to eliminate ESP8266 Resets
A cleaner ESP8266-12 hardware setup
ESP8266 WIFI dropout-proof connectivity
Past Monitoring the Esp8266 Performance
My original reliability test used a ThingSpeak channel to monitor the ESP8266 performance. This same channel will now be reused to evaluate the updated sketch structure. The longest run-time recorded during the original test was 28 days. Then, the ESP8266 crashed fatally and could not recover. Yet every 24 hour period in the 28 day span recorded at least 10 ESP8266 resets. Fortunately, the sketch was structured to recover from a reset. That is, until the fatal crash occurred during day 28.
This Time Should Be Better
The original sketch used polling to check for http server requests. This was executed each loop() cycle. I used this structure based on project examples found on-line. But the design is flawed. A much more reliable approach is to setup a separate event-driven callback function outside the sketch loop() function to respond to http requests. This post presents more detailed description of this http server sketch structure using the ESP8266. Operating in a separate thread, the responsiveness of the callback is not dependent upon the time required to execute the sequential steps in the loop() function.
I also discovered that my USB to serial device was causing frequent ESP8266 resets. It also was failing frequently during sketch uploads.
This was replaced, and then removed once the final sketch was installed. The operating unit now only uses the 5V and Gnd wires from a USB cable. The 5V is fed through a voltage
regulator to provide the 3.3v for ESP operation.
Monitoring the ESP8266
In this case, the ESP8266 functions as an http server. In addition, it periodically reads any attached sensors. These sensor values are returned as an http reply to a request. The current ESP8266 system time (seconds since the last reset) and the number of WIFI disconnect/reconnections are also returned; all within a JSON string.
So the ESP8266 simply waits for requests and monitors sensors.
I have set up a CRON script, written in php, to request and record the current values from the ESP8266. The script also records the values both to the ThingSpeak channel and a separate mySQL database. This is repeated once every hour, on the hour. A description of this process and the ThingSpeak channel is detailed in this post.
The Results
So far, the ESP8266 has been running continuously for almost 8 days without a single WIFI drop-out or ESP8266 reset. There is no reason to doubt that this system will run indefinitely. That is, until the power company delivers a disruption in service or my internet connection goes down.
In Closing
I hope this experiment offers encouragement to anyone using the ESP8266 that has been frustrated with unreliable performance. Check back periodically to see how long this module performs with crashing with a reset. Here is a quick link to the ThingSpeak channel monitoring the unit.
2016-Sep-23 Update:
The Santa Ana winds kicked up this afternoon and with rising temperatures, ACs were cranking everywhere here in Southern California. This triggered a one-minute power outage which also shut down the ESP8266 after a bit over 8 hours of continuous operation. A UPS on the ESP8266 per source is needed to prevent this from happening again.
2016-Oct-17 Update:
Five days ago I discovered that the ESP8266 data feed into the ThingSpeak channel was no longer updating values. After some troubleshooting, the root cause was isolated to my 24 port Cisco Ethernet Switch. The problem was not with the switch, but rather one of my devices connected to it. I removed all the connections and re-introduced the essential devices, one-by-one with the system restored to full operational capability. One of the devices was a WIFI access point. This device is used in my home network to extend the range of my WIFI coverage. And that device is what the ESP8266 connects to for network access. Unfortunately, while troubleshooting, a disturbance to the ESP8266 power source occured, reseting the device after 17 days of continuous operation. I used this opportunity to install my new UPS unit to power the ESP and the network modem/router during power outages. This will hopefully eliminate this disruption source to my on-going ESP8266 up-time stress test.
I have an ESP-01 module that has been logging temp, humidity, and air pressure constantly for about 3 months. Although I don't specifically monitor resets, I have never experienced a hard lock up. It is running a basic program with no web interface, it just dumps data to my server via HTTP once every 5 minutes. It only stopped working when I changed my wifi password. I am however using the SDK directly with CHERTS dev kit. Pulling power from a computer's USB port is a very bad idea, and should be supplemented with some very large capacitors (batteries) as a computer USB port will not provide enough reliable power. Am I correct in understanding that you ran these tests while powered from a computer? My setup ran with a 3v wall wart: http://2xod.com/articles/ESP8266_and_BME280_sensor/
I never used a computer to power the esp. It was a 5v wall plug USB socket. The problem was that I was using a USB to serial adapter before using a voltage regulator to drop it down to 3.3v. The USB to serial adapter must have been drawing intermittent power surges that would cause the esp to reset. That problem was completely eliminated when I used an old USB cable with only the 5v/gnd wires connected.
Thanks for the excellent info. I have been interested in this and your callback enhancements for the Arduino IDE. Have you ever considered a similar callback library for the HTTP client side? My ESP8266 is doing the reverse, connecting to a server and retrieving info with HTTP GET on a regular basis. The polling architecture is ugly and I'd like to replace it with a callback for the http response. I've not done anything with with the native SDK before though, and the kung fu may be a bit above my reach. I see at least one native SDK example here:
https://github.com/Caerbannog/esphttpclient
Thanks for the example. Just as with the http server, this code can likely be adapted to work with the Arduino IDE. Stay tuned, I may explore this possibility soon and share the results in a blog post.
I have found that many ESP8266 users shy away from the EspressIf SDK. That is too bad as it provides a very powerful development platform. My toolchain uses the Eclipse IDE with the SDK.
Thank you for your deep analysis! It helped me a lot!
If you can adapt for the Arduino IDE.. that would be incredible!
Your comment is unclear as to your expectation. The sketch referenced in this post was developed using the Arduino IDE.
Thank you. I have the same problem so I was happy to find your article. I have tried the capacitor, regulated supply, a flush function, and added some delays. Still get random hangs. Using the Arduino WebServer example on D1 mini lite chips and WEMOS D1 and some little esp-01s. All were hanging. Usually getting 12-24 hours until hang. We will see how your suggestion goes. There is always the interrupt reset dog.
Longest duration I have experienced without a reset was 49 days. Even with a UPS powering my device and cable modem, I have observed occasional glitches in the internet service which is out of my control and another cause of reset. If you are in a densely populated environment, competition for the limited wifi channels, even using your own private router is another ocassional cause of resets out of your control.
I have been thinking about trying a dual redundant system (2 esps) as method of eliminating resets entity as the odds of both eps resetting at the same time is nearly zero.
Update. Another clue or new problem. I have 5 esp01 running on my local LAN. I rebooted the router and none came back online as responding web servers but they reconnected to the router..
You might consider having your esp save some values to the flash memory (nvm). These values are saved even after power is removed. This would allow you to pinpoint where in the code it is hanging.
You can also , upon detection of loss of wifi, store the pertinent values identifying the state of the esp in nvm and force a reset instead of simply reconnecting. You get a clean start and maintain the state of the esp, as if nothing happened.
One more thing, I like to limit code in one iteration to say one sensor reading, any more and you may risk blowing the esp timeout and as result, have a reset.
Good luck.
I added your code in and only got an hour uptime out of the deal. I ran the reconnect wifi counter out to client.print and it stayed a zero. Now I am thinking it's stuck in a loop somewhere. I can still ping the device and the router says it's connected.
These are all good ideas. I shortened my code up to the original gpio/1 or gpio/0 instead of reading many empty pins. Then I got it started and banged on the F5 key 10 times and it quit.