DSPRelated.com
Forums

6455 Ethernet network error

Started by feng...@gmail.com February 10, 2011
Hello,

In my project we use Gigabit Ethernet to transfer data from our image board with 6455 to image server (a high-performance computer). Our circuit board is placed inside a machine.

I use NDK 2.0 and the helloWorld.pjt with NDK to develop my own project (TCP-IP protocol). The program running on the server is developed by another guy, and I don't know much about it.

(The data rate is about 500 Mbps. Previously I tested 6455 Ethernet with another computer, and got 700+ Mbps speed.)

Normally the network transaction works well and reposefully for several hours. But sometimes, the network speed suddenly shuts down. When that happens, from my dsp project, I see that the network send() function times out (send() returns -1, network error code 35). By default, TCP uses BLOCK mode, and I set SEND time limit 12s.

I have ran network program on the board (not in the working machine) with my computer for several days, and the problem never occured.

Does dsp or the server causes the problem, or does the complex electrical environment disturb the circuit board and Ethernet peripherals?

Thanks in advance!

_____________________________________
Francis,

I haven't seen the code, so can only guess.

Here are my guesses.

--a table overflows
--a malloc fails
--a memory leak

I would be trying to obtain the source for the tcp/ip stack and debug it as that
sounds like where the failure is occurring.

R. Williams
---------- Original Message -----------
From: f...@gmail.com
To: c...
Sent: Thu, 10 Feb 2011 22:22:09 -0500
Subject: [c6x] 6455 Ethernet network error

> Hello,
>
> In my project we use Gigabit Ethernet to transfer data from our image
> board with 6455 to image server (a high-performance computer). Our
> circuit board is placed inside a machine.
>
> I use NDK 2.0 and the helloWorld.pjt with NDK to develop my own
> project (TCP-IP protocol). The program running on the server is
> developed by another guy, and I don't know much about it.
>
> (The data rate is about 500 Mbps. Previously I tested 6455 Ethernet
> with another computer, and got 700+ Mbps speed.)
>
> Normally the network transaction works well and reposefully for
> several hours. But sometimes, the network speed suddenly shuts down.
> When that happens, from my dsp project, I see that the network send()
> function times out (send() returns -1, network error code 35). By
> default, TCP uses BLOCK mode, and I set SEND time limit 12s.
>
> I have ran network program on the board (not in the working machine)
> with my computer for several days, and the problem never occured.
>
> Does dsp or the server causes the problem, or does the complex
> electrical environment disturb the circuit board and Ethernet peripherals?
>
> Thanks in advance!
------- End of Original Message -------

_____________________________________
Feng Li-

> In my project we use Gigabit Ethernet to transfer data
> from our image board with 6455 to image server (a
> high-performance computer). Our circuit board is
> placed inside a machine.
>
> I use NDK 2.0 and the helloWorld.pjt with NDK to
> develop my own project (TCP-IP protocol). The program
> running on the server is developed by another guy, and
> I don't know much about it.
>
> (The data rate is about 500 Mbps. Previously I tested
> 6455 Ethernet with another computer, and got 700+ Mbps
> speed.)
>
> Normally the network transaction works well and
> reposefully for several hours. But sometimes, the
> network speed suddenly shuts down. When that happens,
> from my dsp project, I see that the network send()
> function times out (send() returns -1, network error
> code 35). By default, TCP uses BLOCK mode, and I set
> SEND time limit 12s.

Suggest to run a test where you control exactly the number of packets (and their lengths). If your test is repeatable
and the time until failure is approximately the same (3 hrs? 4 hrs?), then it's probably a memory leak, a bad
pointer, or some other software issue that takes time to develop.

A couple of years ago I was involved in debugging a problem that took 22 hrs until failure. A bad write pointer was
very slowly advancing through memory until finally it overwrote a critical location -- and then 'boom', total hardware
freeze. The lock-up was so bad there was no way to debug (look at memory values, program execution trace, etc).
Initially the time was intermittent, but, once we created a precisely controlled test, with all I/O data generated
exactly the same each time, it became clear that time until failure was repeatable.

-Jeff

_____________________________________
Feng Li,

On 2/11/2011 1:11 AM, Jeff Brower wrote:
>
> Feng Li-
>
> > In my project we use Gigabit Ethernet to transfer data
> > from our image board with 6455 to image server (a
> > high-performance computer). Our circuit board is
> > placed inside a machine.
> >
> > I use NDK 2.0 and the helloWorld.pjt with NDK to
> > develop my own project (TCP-IP protocol). The program
> > running on the server is developed by another guy, and
> > I don't know much about it.
> >
> > (The data rate is about 500 Mbps. Previously I tested
> > 6455 Ethernet with another computer, and got 700+ Mbps
> > speed.)
> >
> > Normally the network transaction works well and
> > reposefully for several hours. But sometimes, the
> > network speed suddenly shuts down. When that happens,
> > from my dsp project, I see that the network send()
> > function times out (send() returns -1, network error
> > code 35). By default, TCP uses BLOCK mode, and I set
> > SEND time limit 12s.
>

You did not clearly mention your network configuration. As an extension
to what Jeff said, I prefer to troubleshoot [if possible] on a simple
network that contains only the client and server. It often makes the
failures more predictable/deterministic.

mikedunn

>
> Suggest to run a test where you control exactly the number of packets
> (and their lengths). If your test is repeatable
> and the time until failure is approximately the same (3 hrs? 4 hrs?),
> then it's probably a memory leak, a bad
> pointer, or some other software issue that takes time to develop.
>
> A couple of years ago I was involved in debugging a problem that took
> 22 hrs until failure. A bad write pointer was
> very slowly advancing through memory until finally it overwrote a
> critical location -- and then 'boom', total hardware
> freeze. The lock-up was so bad there was no way to debug (look at
> memory values, program execution trace, etc).
> Initially the time was intermittent, but, once we created a precisely
> controlled test, with all I/O data generated
> exactly the same each time, it became clear that time until failure
> was repeatable.
>
> -Jeff
Richard and Jeff,
Thanks very much for your guesses and suggestions.
I will check my code firstly.
Francis

_____________________________________
If u dont know the knowledge of assembly language or c or c plus , but
u directly to write in memory space if possible ,it is write to u it
is best way , please given ideas

On 2/11/11, f...@gmail.com wrote:
> Richard and Jeff,
> Thanks very much for your guesses and suggestions.
> I will check my code firstly.
> Francis
>

_____________________________________
mikedunn,
Previously in my article I mentioned:
"I have ran network program on the board (not in the working machine) with my computer for several days, and the problem never occured."
The test network program is different from that runs on the machine. However, they both use NDK to develop network-oriented projects and are based on helloWorld.pjt NDK provides.
One possible case is that: the board and my project are OK, the program running on the server includes bugs.
Francis

_____________________________________
krishnan rani,
I know some about C.
I can't understand your article. Would you please explain it more clearly? Thank you!
Francis

_____________________________________