Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
random watchdog error
#1
Hi all
I have some strange watchdog error that I cannot really debug.
The situation is the following:
- I have a complete system (6 degrees of freedom) running with a Power Brick LV IMS
- The system is expanded with an ethercat network to read additional serial encoders and temperature sensors
- The system is considered finished, ready to be shipped to customer
- The system is here in my lab to execute some run-in tests, that is moving the system day and night, trying to detect some error or faults

The strange situation comes from some watchdog faults that appears at random and I cannot really see where they come from: sometimes it happens that the system stays alive for 80 hours and nothing happens; then, I perform a reboot and after few minutes the watchdog trips and I need to power-cycle the system.
How can I debug a situation like that? Is there a log inside the PMAC that tells us why the watchdog has tripped?

Thanks a lot
gigi
Reply
#2
If this always happens after a re-boot, always log the boot response and send the “offending” one to support@deltatau.com.
It may also be useful to have the serial port capturing any messages during operation.
Reply
#3
When it does watchdog, what type is it? If it is a "soft" watchdog, there is some debugging that can be done.

You can set up a gather of PMAC Status Elements and also of status elements for your program--if you have a PLC that functions as a state machine, for instance, you can record what states you are in--and then in a separate PLC, set Gather.Enable=3 to perform an indefinite gather. Once the status indicates that a watchdog has occurred, the plot can be stopped and the data can be analyzed on the computer.

The most common cause of a hard watchdog is, ultimately, the 5V supply on the unit dipping too low. On a Clipper, this may be easy to troubleshoot (as the 5V is brought in directly), but on other form factors, it may be harder to address, as the 5V is likely stepped down from an external 24V supply.
Reply
#4
(02-28-2018, 12:09 PM)AAnikstein Wrote: When it does watchdog, what type is it? If it is a "soft" watchdog, there is some debugging that can be done.

at the moment I am debugging the system without the IDE, only our custom software is connected. We see that the front watchdog led switch-on, then after few moments the system reboots by itself. From our software, I can see that the sys.uptime restart from zero.
The debug is painfully slow, since the watchdog trips after some hours of work: as an example, the system has just got into error after 12400 s (3.5 hours) of use. I believe that a power supply fault should power off the system immediately, correct?

thanks a lot
gigi
Reply
#5
Ciao Gigi,
it must be something in Lecco's area (joke)
I experienced a similar problem, albeit with a different CPU (UMAC 465), when using a ethercat network.
After some weeks of debugging we came to conclusion together with DTCH that there is something in the critical interrupt routine that causes a kernel panic in these conditions. Initially I thought it was related to the number of Ethercat axes (16) I was using, but then it happened (apparently in a random fashion) to "lighter" machines (just the WD, not the reboot).
So it could be an idea to turn the critical interrupt off

Ciao
Andrea
Reply
#6
Ciao Andrea et all

(03-08-2018, 05:53 AM)tecnico Wrote: ...
After some weeks of debugging we came to conclusion together with DTCH that there
...

I installed the patch that disables the interrupt one week ago, and since then the system did not encoured any WD trip or reboot or something strange.
I would say now that the problem is fixed.

Many thanks for the help, I would never ever fixed that in time for delivery.

DeltaTau guys: is it possible to make this update something "official" and known to public?

Ciao
gigi
Reply
#7
Hello guys,

I have the same problem here: two PowerBrickAC-based system with multiple axes falling into hardware watchdog state randomly.
Having a lot of C code, both in background programs and RTI, I am familiar with DT software watchdogs, but hardware ones? I have no idea how to debug them.

I am curious about your solution, the critical interrupt disabling. How could you do this ? You mentioned a patch; could you tell me where did you find it ?

Thanks a lot !

Johann
Reply
#8
OK I find what my problem was. Using the linux "top" and "watch -n 0.5 cat /proc/xenomai/stat" commands, I could see that one of my debug process overload the CPU (idle dropped less that 1%). Disabling this debug process (basically high frequency logs) help the idle to rise back to 40%: no more hardware WD.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)