comp.dsp | OT: Ariane 5 Launcher Failure| page 3

Reply by Tim Wescott ●September 1, 20152015-09-01

On Tue, 01 Sep 2015 23:53:33 +0200, Sebastian Doht wrote:

> Am 01.09.2015 um 21:01 schrieb gyansorova@gmail.com:
>> On Tuesday, September 1, 2015 at 11:36:05 PM UTC+12, Randy Yates wrote:
>>> Folks,
>>>
>>> I've been in a LinkedIn discussion in which the following analysis an
>>> Ariane 5 failure is documented:
>>>
>>>    http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
>>>
>>> I'm just reposting it here since I find it fascinating, and I bet
>>> there are a few folks here (Tim, you come to mind especially) who
>>> might have a few things to say about it.
>>>
>>> I find it almost laughable (if it weren't for the expense and danger
>>> such a failure had or potentially had) that the root cause was a
>>> conversion from float to integer! It supports a "feeling" I've had for
>>> a long time that coding in float is dangerous for just such reasons.
>>> --
>>> Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
>>
>> I thought they used ADA for such things
>>
>>
> As far as I recall they used Ada but turned all range checks of Ada off
> which makes the usage of Ada as an argument for increased safety quite
> meaningless.
> 
> "A fool with a tool is still a fool"

The article said something about overflowing an integer value and popping 
an exception.  Which sounds more like they DID hit a range check, and 
lost a rocket ship because of it.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Tim Wescott ●September 1, 20152015-09-01

On Tue, 01 Sep 2015 12:01:51 -0700, gyansorova wrote:

> On Tuesday, September 1, 2015 at 11:36:05 PM UTC+12, Randy Yates wrote:
>> Folks,
>> 
>> I've been in a LinkedIn discussion in which the following analysis an
>> Ariane 5 failure is documented:
>> 
>>   http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
>> 
>> I'm just reposting it here since I find it fascinating, and I bet there
>> are a few folks here (Tim, you come to mind especially) who might have
>> a few things to say about it.
>> 
>> I find it almost laughable (if it weren't for the expense and danger
>> such a failure had or potentially had) that the root cause was a
>> conversion from float to integer! It supports a "feeling" I've had for
>> a long time that coding in float is dangerous for just such reasons.
>> --
>> Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
> 
> I thought they used ADA for such things

You remind me of the people I was working with the one time that I 
debugged ADA code.

We were a C house.  They were a bunch of ADA people with the attitude "if 
there's a bug, and there's C code, then the bug is in the C code".

The bug was in their (ADA) code, where they improperly used an ADA 
feature that's not present in C.  AND, it took a C programmer who'd never 
written a line of ADA to find the bug (and rub their noses in it until 
they opened their eyes and LOOKED).

Bad code is bad code.  There is no magic language that'll enforce error-
free software.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Steve Pope ●September 1, 20152015-09-01

Tim Wescott  <seemywebsite@myfooter.really> wrote:

>You remind me of the people I was working with the one time that I 
>debugged ADA code.

>We were a C house.  They were a bunch of ADA people with the attitude "if 
>there's a bug, and there's C code, then the bug is in the C code".

>The bug was in their (ADA) code, where they improperly used an ADA 
>feature that's not present in C.  AND, it took a C programmer who'd never 
>written a line of ADA to find the bug (and rub their noses in it until 
>they opened their eyes and LOOKED).

Classic.

Steve

Reply by rickman ●September 1, 20152015-09-01

On 9/1/2015 6:18 PM, Randy Yates wrote:
> rickman <gnuarm@gmail.com> writes:
>
>> On 9/1/2015 7:36 AM, Randy Yates wrote:
>>> Folks,
>>>
>>> I've been in a LinkedIn discussion in which the following analysis an
>>> Ariane 5 failure is documented:
>>>
>>>     http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
>>>
>>> I'm just reposting it here since I find it fascinating, and I bet there
>>> are a few folks here (Tim, you come to mind especially) who might have a
>>> few things to say about it.
>>>
>>> I find it almost laughable (if it weren't for the expense and danger
>>> such a failure had or potentially had) that the root cause was a
>>> conversion from float to integer! It supports a "feeling" I've had for a
>>> long time that coding in float is dangerous for just such reasons.
>>
>> I find this conclusion to show an immense lack of understanding of the
>> cause of the failure.  Did we read the same report?
>>
>> The use of integers for the variable that was the float would not have
>> mitigated the accident.  If you had used an N bit integer the same
>> conversion to a 16 bit integer would have resulted in the same
>> overflow and conversion error.
>>
>> The two primary causes of the accident were allowing the software for
>> alignment of the strap-down inertial platform to continue to run after
>> liftoff when it received invalid inputs which resulted in the out of
>> range problem and the decision to shut down the processor on this
>> error based on the assumption that the software was not faulty but
>> rather the hardware was, which was an erroneous assumption in this
>> case.
>
> I think there are several places one could lay the "cause" (perhaps
> "root cause" was too extreme of a term). I certainly won't argue that
> one would be the decision to leave the calibration running after it was
> no longer required. That just doesn't make sense.
>
> Another could be the generic exception-handling specification that all
> exceptions were catastrophic and should result in the processor being
> shut down.
>
> Yet another could be the designers' decision to allow this to generate
> an exception at all and not test for it and take other non-exceptional
> action. That is essentially my argument.

If you read the full report they had to make some tradeoffs in the 
interest of performance.  Now that I have thought about this a bit, I 
understand their reasoning for the shutdown.  They were working with the 
premise that the software would be adequately vetted and such errors 
should not exist.  Obviously this premise was not correct in the end.


> Let me ask a question: what if the alignment algorithm designer had used
> ONLY a 16-bit integer for the horizontal bias. Then, AT DESIGN TIME, the
> algorithm designer would have been forced to consider out-of-range input
> and choose the action more intelligently. For example, instead of
> shutting the software down, they could have saturated the value. Granted
> this could have been done with the double value as well, but the point
> is that designer is FORCED to consider the case if you are thinking with
> an integer frame-of-mind.

You may be over simplifying the algorithm.  We don't know the details 
and it might not be suitable to 16 bit integers.  I would think if it 
were, they would have used 16 bit integers.


> If a saturation had been used, then we wouldn't be talking about
> exceptions in this report as it would have never happened.

Nothing you say here changes the fact that the problem was not due to 
the use of floating point.  If the algorithm required calculations at 
higher resolution than 16 bits a higher resolution integer result would 
still need to be truncated in some manner to fit the 16 bit integer 
receiving the data.  This would still cause the same failure if done in 
the same way under the same conditions.

-- 

Rick

Reply by rickman ●September 1, 20152015-09-01

On 9/1/2015 6:24 PM, Randy Yates wrote:
> rickman <gnuarm@gmail.com> writes:
>
>> On 9/1/2015 10:59 AM, Tim Wescott wrote:
>>>
>>> Now, if I'm going to bring MY prejudices to bear on this, it was because
>>> the systems engineering team was of the opinion that embedded software is
>>> Black Magic, or they considered that it doesn't really have value because
>>> it doesn't show up as a line item on the bill of materials.
>>
>> Prejudice is exactly the right word.
>
> Call it what you want - if a different approach had been made, as I
> outlined in a post just a few minutes ago, the Europeans would be
> millions of dollars and a missile launch up.

Lol.  That is quite a stretch...

-- 

Rick

Reply by Randy Yates ●September 2, 20152015-09-02

Tim Wescott <seemywebsite@myfooter.really> writes:

> On Tue, 01 Sep 2015 18:33:36 -0400, Randy Yates wrote:
>
>> Tim Wescott <tim@seemywebsite.com> writes:
>> 
>>> On Tue, 01 Sep 2015 07:36:01 -0400, Randy Yates wrote:
>>>
>>>> Folks,
>>>> 
>>>> I've been in a LinkedIn discussion in which the following analysis an
>>>> Ariane 5 failure is documented:
>>>> 
>>>>   http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
>>>> 
>>>> I'm just reposting it here since I find it fascinating, and I bet
>>>> there are a few folks here (Tim, you come to mind especially) who
>>>> might have a few things to say about it.
>>>> 
>>>> I find it almost laughable (if it weren't for the expense and danger
>>>> such a failure had or potentially had) that the root cause was a
>>>> conversion from float to integer! It supports a "feeling" I've had for
>>>> a long time that coding in float is dangerous for just such reasons.
>>>
>>> Well, I don't see that as the biggest error, or even one that, given
>>> the nature of the root problem, would have saved the thing if it was
>>> corrected.
>> 
>> Why not? If the BH conversion was protected as other variables, or an
>> integer was used that saturated, there would have been no exception
>> generated and thus no crash (due to this bug).
>
> I think that saying that the problem was that they used floating point is 
> like saying "he didn't apply the brakes early enough" about a guy who 
> went driving on wet roads with bald tires.

Modify that analogy to say,

  "he got drunk and didn't apply the brakes early enough" about a guy who 
  went driving on wet roads with bald tires.

and I think it actually becomes quite applicable. What was more the
problem, that he was driving drunk, or that his tires were bald? You
could blame either one. 

> Yes, it's _a_ correct interpretation of the evidence.  But I don't think 
> it's the _most useful_ interpretation.

Interpretations aside, the fact is the rocket would not have crashed had
this (integer) problem been avoided (assuming a lot of other things that
are pretty obvious, like an O-ring busting). If that doesn't make the
error an issue, I don't know what does.
-- 
Randy Yates
Digital Signal Labs
http://www.digitalsignallabs.com

Reply by Randy Yates ●September 2, 20152015-09-02

rickman <gnuarm@gmail.com> writes:

> On 9/1/2015 6:24 PM, Randy Yates wrote:
>> rickman <gnuarm@gmail.com> writes:
>>
>>> On 9/1/2015 10:59 AM, Tim Wescott wrote:
>>>>
>>>> Now, if I'm going to bring MY prejudices to bear on this, it was because
>>>> the systems engineering team was of the opinion that embedded software is
>>>> Black Magic, or they considered that it doesn't really have value because
>>>> it doesn't show up as a line item on the bill of materials.
>>>
>>> Prejudice is exactly the right word.
>>
>> Call it what you want - if a different approach had been made, as I
>> outlined in a post just a few minutes ago, the Europeans would be
>> millions of dollars and a missile launch up.
>
> Lol.  That is quite a stretch...

How so? If the integer overflow issue had been dealt with at design
time, none of the rest of the bad decisions (e.g., leaving the
calibration code running after launch) would have mattered.
-- 
Randy Yates
Digital Signal Labs
http://www.digitalsignallabs.com

Reply by Les Cargill ●September 2, 20152015-09-02

Randy Yates wrote:
> Folks,
>
> I've been in a LinkedIn discussion in which the following analysis an
> Ariane 5 failure is documented:
>
>    http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
>
> I'm just reposting it here since I find it fascinating, and I bet there
> are a few folks here (Tim, you come to mind especially) who might have a
> few things to say about it.
>
> I find it almost laughable (if it weren't for the expense and danger
> such a failure had or potentially had) that the root cause was a
> conversion from float to integer! It supports a "feeling" I've had for a
> long time that coding in float is dangerous for just such reasons.
>

It was a complex failure.

floats are perfectly safe but it takes  a lot of filtering - pun
intended - to make them so.

I probably went ... 20 years not using them, though. It just didn't come 
up, plus all the math was money when there was math, so BCD was
used.

-- 
Les Cargill

Reply by Les Cargill ●September 2, 20152015-09-02

spope33@speedymail.org (Steve Pope) wrote:
> Randy Yates  <yates@digitalsignallabs.com> wrote:
>
>> I find it almost laughable (if it weren't for the expense and danger
>> such a failure had or potentially had) that the root cause was a
>> conversion from float to integer! It supports a "feeling" I've had for a
>> long time that coding in float is dangerous for just such reasons.
>
> This puts you with Von Neumann.
>
> Floats and doubles are not dangerous.  They can store integers within a
> certain range, just like any other format.
>
> Programmers however are dangerous.
>

Very.

> Steve
>

-- 
Les Cargill

Reply by Les Cargill ●September 2, 20152015-09-02

Randy Yates wrote:
> spope33@speedymail.org (Steve Pope) writes:
>
>> Randy Yates  <yates@digitalsignallabs.com> wrote:
>>
>>> I find it almost laughable (if it weren't for the expense and danger
>>> such a failure had or potentially had) that the root cause was a
>>> conversion from float to integer! It supports a "feeling" I've had for a
>>> long time that coding in float is dangerous for just such reasons.
>>
>> This puts you with Von Neumann.
>
> I'm not sure if that's a complement or a criticism...
>
>> Floats and doubles are not dangerous.  They can store integers within a
>> certain range, just like any other format.
>
> Yes, but when humans use them, they start being sloppy!

Oh no no no!. You cannot trust them. Although really - integer 
saturation is just as dangerous and probably more common.

> And if you are
> not sloppy, you might as well use integers/fixed-point (for many many
> things).
>
>> Programmers however are dangerous.
>
> This sounds a lot like the anti-gun-control sentiment..
>

"Floating point doesn't kill rockets... programmers
kill rockets... "

-- 
Les Cargill

Previous 1 234 5 6 Next

OT: Ariane 5 Launcher Failure

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group