Contention using M-Vars (user-shared-memory) in usercode and background script plc

sutty · February 5, 2021

Hi!

In usrcode.c (user-written-servo) I am reading from and writing to M-Vars, which for itself are assigned to locations on user-shared-memory. But I am NOT using GetPtrVar() nor SetPtrVar(). Instead it works like this:

address = 0x001880;

ushptr = (volatile unsigned int*)(pushm + address);

read:

u32v = (unsigned int)((*ushptr & (1 << start_bit)) >> start_bit);

write:

*ushptr |= mask;

In a script plc I do the same on such M-Vars. The m-variable definition looks like this:

M1->u.user:$001880.26.1

In that plc M1 is reset once in a while.

Now and then it happens, that setting M1 within uws does not succeed, and remains 0.

Besides, using P-Vars instead of M-Vars dismisses such issue and setting/resetting works all the time.

Regards, Anton

Ps. pp_proj.ini:

[CPU_AFFINITY]

rtpmac_main=0

servotask=1

rtitask=1

backgroundthread=0

Eric Hotchkiss · February 9, 2021

It may be easier to write to user shared memory from C like this. You would still have to shift and mask.

unsigned int *Udata;
Udata = (unsigned int *)pushm;
Udata[10]=99;

Do you still get the issue with this method? We use it in the ISR, so I would be surprised if it had issues.

In script, you can access the whole word easily like this

M1 -> Sys.Udata[10]

or a partial word by finding the address and then using it.

L0= Sys.Udata[10].a -Sys.pushm L0
M1->U.USER:40.0.1

curtwilson · February 11, 2021

Any time you do read/modify/write operations (whether explicit or implicit) on the same register from different priority levels, you can have the problem you are seeing.

Let's say your background PLC has started to change the M-variable by reading the register, but is interrupted by the servo before it can write the modified value back to the register. The servo does its complete read/modify/write operation, then allows the background task to continue.

The background task then finishes the modify-and-write operations (using what it read before the servo interrupt), and its write operation overwrites what the servo task just wrote, wiping out its change. (It is counter-intuitive for most people that the higher-priority task fails in this type of conflict.)

Note that this does not happen with whole-word access that does not require this multi-step process. If memory is plentiful, the easiest thing is to devote an entire register to a variable, even if it is only using part of the register.

sutty · February 16, 2021

Thank you, Curt, for clarifying!

Besides your suggestions (whole-word access) I don't see any other way to go.

Anyway, it will do, we got some space left ;-)

sutty · February 18, 2021

Curt, will your suggestion to use whole-words (avoiding multi-step process) solve such issue in all cases (100%)?

I am asking because in our case:

uws checks a caputre flag. If the capture has happened it sets variable M1=1

bgplc polls M1. After it sees M1==1 it does stuff and at the end resets M1=0

But in very raw cases M1 never gets set even tough the capture has happened.

Is that possible while bgplc only reads M1, the interrupting uws can not change M1 at all?!

What do you mean by implicit/explicit operations?

Would it also help to use a self-definded M1->*u instead?

Eric Hotchkiss · February 18, 2021

Curt, will your suggestion to use whole-words (avoiding multi-step process) solve such issue in all cases (100%)?

I am asking because in our case:
uws checks a caputre flag. If the capture has happened it sets variable M1=1
bgplc polls M1. After it sees M1==1 it does stuff and at the end resets M1=0
But in very raw cases M1 never gets set even tough the capture has happened.

Is that possible while bgplc only reads M1, the interrupting uws can not change M1 at all?!

Is it possible that the script PLC had not yet set M1 to 0 when the User Written Servo set it to 1?

What do you mean by implicit/explicit operations?

If you define an M-Variable to a single bit of a larger register (M1->u.user:$001880.26.1) and then set that bit to a value, there is an implicit operation. PMAC still has to read the whole value, change one bit and write that new value.

In your C code you are doing the same thing explicitly.

Either way there is a chance for a higher priority task to interrupt after the read and before the write, ignoring the higher priority task's changes.

Would it also help to use a self-definded M1->*u instead?

This should not make a difference.

sutty · February 19, 2021

Eric,

uws sets M1=1 only if it reads M1==0 & Capture has happened

bgplc resets M1=0 only if it reads M1==1.

What I was asking for, can it be possible uws fails writing M1=1 at all (in that circumstance that uws interrupted bgplc while bgplc was just reading M1 & a Capturing has happened so uws would want to set M1=1)

Using M1->*u or M1->u.user:$001880.0.32 (whole-word access) instead of M1->u.user:$001880.26.1 (bit access - multi-step process) are both possible solutions for this issue?

Regards,

Anton

curtwilson · February 19, 2021

Anton:

The problem can occur when a low-priority task has a multi-step operation that (1) reads a register; (2) does logical operations based on what it has read; and (3) writes back to that register based on what it read in (1) and what it decided in (2).

The most common, but not the only, logical operation here is the modification of a subset of the register. The simple act of writing to a partial-word M-variable does this implicitly and automatically. It can also be done with explicit code. (Writing to a whole-word M-variable does not invoke this multi-step process.)

The problem occurs when the low-priority task is interrupted between (1) and (3) by a high-priority task that writes to the same register. That task does write to the register successfully, but as soon as the low-priority task resumes, its step (3) will undo what the high-priority task has just done.

Note that just having the low-priority task read the register at this time does not create the problem -- it is the write operation that creates the problem.

sutty · February 23, 2021

Hi Curt!

I took a plot of what happens here:

m2->u.user:$1884.26.1

plc4: increments p1+=1 and resets m2=0 if m2==1

uws: increments p2+=1 and sets m2=1 if m2==0

with Sys.BgSleepTime=1000 the difference between counter p1 and p2 remains 0

with Sys.BgSleepTime=250 p2 advances about 1 each 10s

Discussion:

In the plot below it shows on the left that plc 4 increments p1 (purple) and resets m2=0, then uws increments p2 (orange) and sets m2=1.

To the right it shows, that plc4 increments p1 and resets m2=0, but must be interrupted by uws, which (for whatever reason?) already sees m2=0 and so increments p2, but does NOT succeed setting m2=1. plc4 resumes at the point where it was interrupted by uws before and finishes up (in any case without incrementing p1). uws at next interrupt still sees m2=0 and does its job (increments p2 and sets m2=1) once more time. now p2 leads p1.

Is this what you expect it to do?!

I dont get it, that uws can see m2 already equal 0 even though plc4 has not finished its task?!!

Eric Hotchkiss · February 23, 2021

Could you add Sys.ServoBusyCtr to the plot?

sutty · February 24, 2021

Sys.ServoBusyCtr (black) remained 0 during gathering, increments slightly within minutes...

sutty · February 24, 2021

After more than 6h of testing-runtime the statistics say following:

curtwilson · February 25, 2021

Anton:

To me, the big mystery of your results is this:

The data gathering you use to analyze the problem is done in the same servo interrupt as the UWS that is supposed to set the bit. If the problem is that the background task resumes and clears the bit, this would not have happened yet when the gathering was done. Very strange!

sutty · February 26, 2021

Hi Curt!

Besides this mystery, what do you think, will I completely overcome this phenomenon by simply using whole-words resp P-Vars ?

The attachment is a simple ppmac-project which leads on a umac-arm cpu to such issue.

P3=0 (whole-word access - no errors)

P3!=0 (bit access - erroneous)

I used a gate3 for the clocks.

No physical hw (motors ..)

Have a nice weekend!

VarSync_09_ppmac_from_scratch.zip

curtwilson · March 2, 2021

Whatever the details of the mechanism, changing parts of a register from multiple tasks that cannot be guaranteed to be fully separate in time is inherently problematic. The simplest fix is to use entire registers.

Contention using M-Vars (user-shared-memory) in usercode and background script plc

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites