DSPRelated.com
Forums

Atomic operations on C6x family of processors

Started by "Manuel E. Cotallo Torres" January 19, 2007
Hi,

Anybody knows how to perform atomic compare-and-swap operations, like in
Intel x86 family cmpxchg operations, locking the access to memory for
threaded concurrent operations but without the impact of
disabling/enabling interrupts?

This is to implement posix mutexes in DSP, without relying on DSP/BIOS.

Thx everybody,
Manuel Cotallo.
Manuel,

the only solution I know is to pack all the code into the five so called
delay slots of a branch instruction. These delay slots are not
interruptable. You will need to write a C-callable assembler function for this.

The problem is that memory reads do also require 4 delay slots until the
destination registers are valid. Even if you read the memory operand in
parallel with the branch instruction, there is only one instruction left -
but you will need at least two for a compare and store. However, you can
put another branch into the delay slots of the first branch, which will
give you some extra non-interruptable cycles.

I don't know if the following function correctly emulates the cmpxchg
instruction (I am not familiar with the x86), but I hope it will help you
to implement it:

; C-prototype: int cmpxchg (int cmp, int *dst, int src)
; parameters : cmp in a4
; pointer to dst in b4
; src in a6
; returns : *dst

.global _cmpxchg
.sect .text
.newblock
_cmpxchg:
b .s2 $1
ldw .d2 *b4, b0 ; b0 = *dst
nop 2
b .s2 b3 ; return to caller
$1: nop
cmpeq .l1 a4, b0, a1 ; b0 is now valid
[a1] stw .d2 a6, *b4
[!a1] mv .l1 b0, a4
Best Regards,
A. Klemenz, D.SignT

At 12:00 19.01.2007 +0100, Manuel E. Cotallo Torres wrote:

>Hi,
>
>Anybody knows how to perform atomic compare-and-swap operations, like in
>Intel x86 family cmpxchg operations, locking the access to memory for
>threaded concurrent operations but without the impact of
>disabling/enabling interrupts?
>
>This is to implement posix mutexes in DSP, without relying on DSP/BIOS.
>
>Thx everybody,
>Manuel Cotallo.
-------------------------------
D.SignT - Digital Signalprocessing Technology GmbH & Co. KG

Adolf Klemenz

Gelderner Str.36
D-47647 Kerken

phone (+49)(0)2833/570-976
fax (+49)(0)2833/3328
email mailto:a...@dsignt.de
web http://www.dsignt.de
-------------------------------
Since the C6x will not interrupt while a branch is in progress, I often take advantage of that for these cases.

I do it in assembly. Write a tiny assemby routine that immediately branches back and does something during the 5 cycle uninterruptable return latency.

Tom Kerekes

----- Original Message ----
From: Manuel E. Cotallo Torres
To: c...
Sent: Friday, January 19, 2007 3:00:38 AM
Subject: [c6x] Atomic operations on C6x family of processors

Hi,

Anybody knows how to perform atomic compare-and- swap operations, like in
Intel x86 family cmpxchg operations, locking the access to memory for
threaded concurrent operations but without the impact of
disabling/enabling interrupts?

This is to implement posix mutexes in DSP, without relying on DSP/BIOS.

Thx everybody,
Manuel Cotallo.
Manuel-

Adolf's solution is very creative. If you get this to work, please let us know.
Thanks!

-Jeff
Adolf Klemenz wrote:
>
> Manuel,
>
> the only solution I know is to pack all the code into the five so called
> delay slots of a branch instruction. These delay slots are not
> interruptable. You will need to write a C-callable assembler function for this.
>
> The problem is that memory reads do also require 4 delay slots until the
> destination registers are valid. Even if you read the memory operand in
> parallel with the branch instruction, there is only one instruction left -
> but you will need at least two for a compare and store. However, you can
> put another branch into the delay slots of the first branch, which will
> give you some extra non-interruptable cycles.
>
> I don't know if the following function correctly emulates the cmpxchg
> instruction (I am not familiar with the x86), but I hope it will help you
> to implement it:
>
> ; C-prototype: int cmpxchg (int cmp, int *dst, int src)
> ; parameters : cmp in a4
> ; pointer to dst in b4
> ; src in a6
> ; returns : *dst
>
> .global _cmpxchg
> .sect .text
> .newblock
> _cmpxchg:
> b .s2 $1
> ldw .d2 *b4, b0 ; b0 = *dst
> nop 2
> b .s2 b3 ; return to caller
> $1: nop
> cmpeq .l1 a4, b0, a1 ; b0 is now valid
> [a1] stw .d2 a6, *b4
> [!a1] mv .l1 b0, a4
>
> Best Regards,
> A. Klemenz, D.SignT
>
> At 12:00 19.01.2007 +0100, Manuel E. Cotallo Torres wrote:
>
> >Hi,
> >
> >Anybody knows how to perform atomic compare-and-swap operations, like in
> >Intel x86 family cmpxchg operations, locking the access to memory for
> >threaded concurrent operations but without the impact of
> >disabling/enabling interrupts?
> >
> >This is to implement posix mutexes in DSP, without relying on DSP/BIOS.
> >
> >Thx everybody,
> >Manuel Cotallo.
> >
> > -------------------------------
> D.SignT - Digital Signalprocessing Technology GmbH & Co. KG
>
> Adolf Klemenz
>
> Gelderner Str.36
> D-47647 Kerken
>
> phone (+49)(0)2833/570-976
> fax (+49)(0)2833/3328
> email mailto:a...@dsignt.de
> web http://www.dsignt.de
Dear All,

An interesting discussion. Let me add my $0.02 :)

I snipped out large portions of the reply posts just to get to
the essence of the problem.

The original question posted was:

> Posted by: "Manuel E. Cotallo Torres" m...@sicubo.com
>
> how to perform atomic compare-and-swap operations, like in
> Intel x86 family cmpxchg operations, locking the access to memory for
> threaded concurrent operations but without the impact of
> disabling/enabling interrupts?

These are the proposed solutions:

> Posted by: "Adolf Klemenz" a...@dsignt.de
>
> the only solution I know is to pack all the code into the five so called
> delay slots of a branch instruction. These delay slots are not
> interruptable.

> Posted by: "Tom Kerekes" t...@yahoo.com TKSOFT
>
> Since the C6x will not interrupt while a branch is in progress, I often take
> advantage of that for these cases.

> Posted by: "Jeff Brower" j...@signalogic.com jbrower888
>
> Adolf's solution is very creative. If you get this to work, please let us know.

Yep, it is a clever way, and no doubt it does work, but it does not solve
the original question not in a micron :)

The whole point is that C6000 does not have an atomic compare-n-swap
operation. Generally, it is not doable to obtain an exclusive access to
memory in a multitasking mode for the C6000 architecture.

Any solution is a variant of the exact algorithm: 1) disable interrupts
2) do whatever is needed in the single process mode and 3) restore interrupts.

It really does not matter how interrupts were disabled, the point is
that they were disabled.

So I wouldn't try do it in the complicated way of nesting several branches,
which is from my point of view an error-prone programming, instead I would
use the plain way of changing GIE. OTH, it is completely valid method to
do something during branch delay slots, no doubts.

I feel uncomfortable that I used so many imperative statements in my email,
but only for the purpose of clarifying this interesting problem. Please
accept my apologies.

Rgds,

Andrew
Dear Andrew,

of course you're right - using the branch delay slots is effectively
identical to disabling interrupts and is by no means a real atomic
operation. However, safely disabling interrupts by clearing the global
interrupt enable (GIE) bit in the C6000 status register requires some
precautions: If an interrupt occurs simultaneously with the mvc instruction
which writes the status register, it is still executed! Additional nops are
required between clearing the GIE and the read/modify/write of a memory
location.
On the other hand, interrupts are automatically disabled once a branch
instruction is in the pipeline. This is a feature of the C6000 core
hardware which can be used to emulate atomic operations, although it surely
has never been intended for this purpose.
I don't know how x86 or similar CPUs implement atomic operations, but it is
quite likely that these will suspend interrupts in hardware too. From a
code portability point of view, I admit, using the branch delay slots is
bad practice!

Best Regards,
Adolf Klemenz, D.SignT
At 03:02 22.01.2007 -0800, Andrew Nesterov wrote:
>Dear All,
>
>An interesting discussion. Let me add my $0.02 :)
>
>I snipped out large portions of the reply posts just to get to
>the essence of the problem.
>
>The original question posted was:
>
> > Posted by: "Manuel E. Cotallo Torres"
> m...@sicubo.com
> >
> > how to perform atomic compare-and-swap operations, like in
> > Intel x86 family cmpxchg operations, locking the access to memory for
> > threaded concurrent operations but without the impact of
> > disabling/enabling interrupts?
>
>These are the proposed solutions:
>
> > Posted by: "Adolf Klemenz"
> a...@dsignt.de
> >
> > the only solution I know is to pack all the code into the five so called
> > delay slots of a branch instruction. These delay slots are not
> > interruptable.
>
> > Posted by: "Tom Kerekes" t...@yahoo.com TKSOFT
> >
> > Since the C6x will not interrupt while a branch is in progress, I often
> take
> > advantage of that for these cases.
>
> > Posted by: "Jeff Brower"
> j...@signalogic.com jbrower888
> >
> > Adolf's solution is very creative. If you get this to work, please let
> us know.
>
>Yep, it is a clever way, and no doubt it does work, but it does not solve
>the original question not in a micron :)
>
>The whole point is that C6000 does not have an atomic compare-n-swap
>operation. Generally, it is not doable to obtain an exclusive access to
>memory in a multitasking mode for the C6000 architecture.
>
>Any solution is a variant of the exact algorithm: 1) disable interrupts
>2) do whatever is needed in the single process mode and 3) restore interrupts.
>
>It really does not matter how interrupts were disabled, the point is
>that they were disabled.
>
>So I wouldn't try do it in the complicated way of nesting several branches,
>which is from my point of view an error-prone programming, instead I would
>use the plain way of changing GIE. OTH, it is completely valid method to
>do something during branch delay slots, no doubts.
>
>I feel uncomfortable that I used so many imperative statements in my email,
>but only for the purpose of clarifying this interesting problem. Please
>accept my apologies.
>
>Rgds,
>
>Andrew
Adolf,

I totally agree. That reminds me of a problem I had with disabling interrupts on the C6x that drove me CRAZY. On one system I use an interrupt and my own OS to switch tasks (threads). Common sense would indicate that if the interrupt occurred that interrupts had to be enabled - not true. Because of the latency you describe, the interrupt can occur after interrupts are disabled. In this case, when returning from the interrupt, interrupts need to be disabled instead of enabled. The C6x handles this with the PGIE bit. My problem occurred when I was interrupted from code that had just turned off interrupts and was expecting to quickly turn back on interrupts, but then switched to another thread that expected interrupts to just be on. The result was that interrupts were turned off permanently! After that I decided to leave GIE alone whenever possible...

Tom Kerekes
DynoMotion, Inc.
----- Original Message ----
From: Adolf Klemenz
To: Andrew Nesterov ; c...
Sent: Monday, January 22, 2007 6:32:40 AM
Subject: Re: [c6x] Re: Atomic operations on C6x family of processors

Dear Andrew,

of course you're right - using the branch delay slots is effectively
identical to disabling interrupts and is by no means a real atomic
operation. However, safely disabling interrupts by clearing the global
interrupt enable (GIE) bit in the C6000 status register requires some
precautions: If an interrupt occurs simultaneously with the mvc instruction
which writes the status register, it is still executed! Additional nops are
required between clearing the GIE and the read/modify/ write of a memory
location.
On the other hand, interrupts are automatically disabled once a branch
instruction is in the pipeline. This is a feature of the C6000 core
hardware which can be used to emulate atomic operations, although it surely
has never been intended for this purpose.
I don't know how x86 or similar CPUs implement atomic operations, but it is
quite likely that these will suspend interrupts in hardware too. From a
code portability point of view, I admit, using the branch delay slots is
bad practice!

Best Regards,
Adolf Klemenz, D.SignT

At 03:02 22.01.2007 -0800, Andrew Nesterov wrote:

>Dear All,
>
>An interesting discussion. Let me add my $0.02 :)
>
>I snipped out large portions of the reply posts just to get to
>the essence of the problem.
>
>The original question posted was:
>
> > Posted by: "Manuel E. Cotallo Torres"
> mcotallo@sicubo. com
> >
> > how to perform atomic compare-and- swap operations, like in
> > Intel x86 family cmpxchg operations, locking the access to memory for
> > threaded concurrent operations but without the impact of
> > disabling/enabling interrupts?
>
>These are the proposed solutions:
>
> > Posted by: "Adolf Klemenz"
> adolf.klemenz@ dsignt.de
> >
> > the only solution I know is to pack all the code into the five so called
> > delay slots of a branch instruction. These delay slots are not
> > interruptable.
>
> > Posted by: "Tom Kerekes" tksoft@yahoo. com TKSOFT
> >
> > Since the C6x will not interrupt while a branch is in progress, I often
> take
> > advantage of that for these cases.
>
> > Posted by: "Jeff Brower"
> jbrower@signalogic. com jbrower888
> >
> > Adolf's solution is very creative. If you get this to work, please let
> us know.
>
>Yep, it is a clever way, and no doubt it does work, but it does not solve
>the original question not in a micron :)
>
>The whole point is that C6000 does not have an atomic compare-n-swap
>operation. Generally, it is not doable to obtain an exclusive access to
>memory in a multitasking mode for the C6000 architecture.
>
>Any solution is a variant of the exact algorithm: 1) disable interrupts
>2) do whatever is needed in the single process mode and 3) restore interrupts.
>
>It really does not matter how interrupts were disabled, the point is
>that they were disabled.
>
>So I wouldn't try do it in the complicated way of nesting several branches,
>which is from my point of view an error-prone programming, instead I would
>use the plain way of changing GIE. OTH, it is completely valid method to
>do something during branch delay slots, no doubts.
>
>I feel uncomfortable that I used so many imperative statements in my email,
>but only for the purpose of clarifying this interesting problem. Please
>accept my apologies.
>
>Rgds,
>
>Andrew
I have been watching this entire thread related to interrupts, disabling
interrupts, and the time required to switch contexts with interest. My
project is running on a board utilizing a 6713 DSP.

One of the interrupt service routines I'm using services the serial
port. The serial port is configured with the FIFO enabled, so that it
/shouldn't/ send an interrupt for every character received. I think it's
configured so that it should only send an interrupt when the buffer
reaches a threshold, or data has sat in the buffer for a certain amount
of time.

I did have the routine that retrieves the data declared like:

> interrupt void UartRecieveISR(void)
> {
> Uint32 gie = IRQ_globalDisable();
>
> // do stuff here..
>
> IRQ_globalRestore(gie);
> }
I was manually disabling the interrupts while i was handling the
interrupt, because I wasn't sure if it was possible that the UART might
generate a second interrupt while I was pulling the data out of it.

Is what I was doing wrong?

Can a second interrupt of the same type happen while you are handling
the first one? If so, is there a sort of context stack that allows
recovery back to the main thread of execution?

I am doing very little in my ISR, just pulling data out of the UART and
incrementing a pointer to the buffer that the data is located in. I've
declared my ISR, the Buffer it uses, and the pointer variables it uses,
to all live in the internal ram of the DSP.

A follow up question related to interrupts and interrupt handling has to
do with using TI's library FFT functions, which are generally declared
as "Interruptibility: The code is interrupt-tolerant but not interruptible"

The documentation also declares:
> Nonāˆ’interruptible: These functions disable interrupts for nearly their
> entire duration. Interrupts may happen for a short time during their
> setup and exit sequence.
> Note that all three categories tolerate interrupts. That is, an
> interrupt can occur at any time without affecting the correctness of
> the function. The interruptiblity of the function only determines how
> long the kernel might delay the processing of the interrupt.
Does this mean that an interrupt that occurs while one of these routines
will get serviced when the routine finishes? Or will it simply be missed?

Wim.
Interrupts cannot happen within an ISR, because GIE is copied into PGIE, and
GIE is turned off, unless you re-enable it. So, there is no need to turn off
GIE in an ISR, as it is off by default.

The code is interrupt tolerant but not interruptible, means that interrupts
Can happen and they will not affect the execution of the code, but the
control will not jump to the ISR, so for the duration of the piped loop
Kernel interrupts are turned off.

Regds
JS

-----Original Message-----
From: c... [mailto:c...] On Behalf Of William
C Bonner
Sent: Tuesday, January 23, 2007 12:19 PM
To: c...
Subject: [c6x] Interrupts, Iterrupt Service Routines, and C6x family of
processors

I have been watching this entire thread related to interrupts, disabling
interrupts, and the time required to switch contexts with interest. My
project is running on a board utilizing a 6713 DSP.

One of the interrupt service routines I'm using services the serial
port. The serial port is configured with the FIFO enabled, so that it
/shouldn't/ send an interrupt for every character received. I think it's
configured so that it should only send an interrupt when the buffer
reaches a threshold, or data has sat in the buffer for a certain amount
of time.

I did have the routine that retrieves the data declared like:

> interrupt void UartRecieveISR(void)
> {
> Uint32 gie = IRQ_globalDisable();
>
> // do stuff here..
>
> IRQ_globalRestore(gie);
> }
I was manually disabling the interrupts while i was handling the
interrupt, because I wasn't sure if it was possible that the UART might
generate a second interrupt while I was pulling the data out of it.

Is what I was doing wrong?

Can a second interrupt of the same type happen while you are handling
the first one? If so, is there a sort of context stack that allows
recovery back to the main thread of execution?

I am doing very little in my ISR, just pulling data out of the UART and
incrementing a pointer to the buffer that the data is located in. I've
declared my ISR, the Buffer it uses, and the pointer variables it uses,
to all live in the internal ram of the DSP.

A follow up question related to interrupts and interrupt handling has to
do with using TI's library FFT functions, which are generally declared
as "Interruptibility: The code is interrupt-tolerant but not interruptible"

The documentation also declares:
> Non-interruptible: These functions disable interrupts for nearly their
> entire duration. Interrupts may happen for a short time during their
> setup and exit sequence.
> Note that all three categories tolerate interrupts. That is, an
> interrupt can occur at any time without affecting the correctness of
> the function. The interruptiblity of the function only determines how
> long the kernel might delay the processing of the interrupt.
Does this mean that an interrupt that occurs while one of these routines
will get serviced when the routine finishes? Or will it simply be missed?

Wim.
Hi Tom, Adolf,

Thank you for pointing me out to that! Are you referring to the
bottom of page 8-11 and top of page 8-12 of SPRU189F? Is it the
EP(n+3) and EP(n+4) in figures 8-12 and 8-13 of the document, that
may clear GIE without affecting the processing of the interrupt;
which results in PGIE=0 in clock cycle 6?

If my guess is correct then now I understtod the purpose of the PGIE bit.
This is quite a miserable (what other adjective to use?) feature,
which looks more like a bug...

How strange is the last condition on page 8-19 regarding a branch
in EP(n+4), as this EP is in the DC phase on clock cycle 4; and how
does the CPU know that it may contain B instruction before decoding? :)

The question is if this problem can be solved by placing clearing
GIE into delay slots of a branch, e.g. return from a subroutine,
as in example code below:

int Disable_Interrupts (void); // returns previous CSR

.TEXT
.GLOBAL _Disable_Interrupts

_Disable_Interrupts: ; asm code
B B3 ; begin return
MVC CSR, B0 ; read CSR
MV B0, A4 ; store previous CSR into return register
|| AND -2, B0, B0 ; clear GIE bit
MVC B0, CSR ; modify CSR
NOP ; branch delay slot 4
NOP ; branch delay slot 5

.END

Since the conditions for interrupt processing are evaluated on every
clock cycle, then after the delay slots of the branch were completed,
GIE is clear and the fourth condition is not met, that makes the interrupt
locked until GIE is set.

Unfortunately, the section 8.4 does not explains all this in full
details, or at least as it seem to me.

In any way, I take back my words that doing something in a branch delay
slots must be avoided. It has to be used and sometimes it might be the
only way to do something right!

Thanks, take care,

Andrew

> Date: Mon, 22 Jan 2007 10:06:54 -0800 (PST)
> From: Tom Kerekes Adolf,
>
> I totally agree. That reminds me of a problem I had with disabling
> interrupts on the C6x that drove me CRAZY. On one system I use an interrupt
> and my own OS to switch tasks (threads). Common sense would indicate that if
> the interrupt occurred that interrupts had to be enabled - not true. Because
> of the latency you describe, the interrupt can occur after interrupts are
> disabled. In this case, when returning from the interrupt, interrupts need
> to be disabled instead of enabled. The C6x handles this with the PGIE bit.
> My problem occurred when I was interrupted from code that had just turned off
> interrupts and was expecting to quickly turn back on interrupts, but then
> switched to another thread that expected interrupts to just be on. The
> result was that interrupts were turned off permanently! After that I
> decided to leave GIE alone whenever possible...
>
> Tom Kerekes
> DynoMotion, Inc.
> ----- Original Message ----
> From: Adolf Klemenz
> Sent: Monday, January 22, 2007 6:32:40 AM
>
> Dear Andrew,
>
> of course you're right - using the branch delay slots is effectively
> identical to disabling interrupts and is by no means a real atomic
> operation. However, safely disabling interrupts by clearing the global
> interrupt enable (GIE) bit in the C6000 status register requires some
> precautions: If an interrupt occurs simultaneously with the mvc instruction
> which writes the status register, it is still executed! Additional nops are
> required between clearing the GIE and the read/modify/ write of a memory
> location.
> On the other hand, interrupts are automatically disabled once a branch
> instruction is in the pipeline. This is a feature of the C6000 core
> hardware which can be used to emulate atomic operations, although it surely
> has never been intended for this purpose.
> I don't know how x86 or similar CPUs implement atomic operations, but it is
> quite likely that these will suspend interrupts in hardware too. From a
> code portability point of view, I admit, using the branch delay slots is
> bad practice!
>
> Best Regards,
> Adolf Klemenz, D.SignT
>
> At 03:02 22.01.2007 -0800, Andrew Nesterov wrote:
>
>Dear All,
>
>An interesting discussion. Let me add my $0.02 :)
>
>I snipped out large portions of the reply posts just to get to
>the essence of the problem.
>
>The original question posted was:
>
> > Posted by: "Manuel E. Cotallo Torres"
> mcotallo@sicubo. com
> >
> > how to perform atomic compare-and- swap operations, like in
> > Intel x86 family cmpxchg operations, locking the access to memory for
> > threaded concurrent operations but without the impact of
> > disabling/enabling interrupts?
>
>These are the proposed solutions:
>
> > Posted by: "Adolf Klemenz"
> adolf.klemenz@ dsignt.de
> >
> > the only solution I know is to pack all the code into the five so called
> > delay slots of a branch instruction. These delay slots are not
> > interruptable.
>
> > Posted by: "Tom Kerekes" tksoft@yahoo. com TKSOFT
> >
> > Since the C6x will not interrupt while a branch is in progress, I often
> take
> > advantage of that for these cases.
>
> > Posted by: "Jeff Brower"
> jbrower@signalogic. com jbrower888
> >
> > Adolf's solution is very creative. If you get this to work, please let
> us know.
>
>Yep, it is a clever way, and no doubt it does work, but it does not solve
>the original question not in a micron :)
>
>The whole point is that C6000 does not have an atomic compare-n-swap
>operation. Generally, it is not doable to obtain an exclusive access to
>memory in a multitasking mode for the C6000 architecture.
>
>Any solution is a variant of the exact algorithm: 1) disable interrupts
>2) do whatever is needed in the single process mode and 3) restore interrupts.
>
>It really does not matter how interrupts were disabled, the point is
>that they were disabled.
>
>So I wouldn't try do it in the complicated way of nesting several branches,
>which is from my point of view an error-prone programming, instead I would
>use the plain way of changing GIE. OTH, it is completely valid method to
>do something during branch delay slots, no doubts.
>
>I feel uncomfortable that I used so many imperative statements in my email,
>but only for the purpose of clarifying this interesting problem. Please
>accept my apologies.
>
>Rgds,
>
>Andrew