DSPRelated.com
Forums

OT: Ariane 5 Launcher Failure

Started by Randy Yates September 1, 2015
Am 01.09.2015 um 21:01 schrieb gyansorova@gmail.com:
> On Tuesday, September 1, 2015 at 11:36:05 PM UTC+12, Randy Yates wrote: >> Folks, >> >> I've been in a LinkedIn discussion in which the following analysis an >> Ariane 5 failure is documented: >> >> http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html >> >> I'm just reposting it here since I find it fascinating, and I bet there >> are a few folks here (Tim, you come to mind especially) who might have a >> few things to say about it. >> >> I find it almost laughable (if it weren't for the expense and danger >> such a failure had or potentially had) that the root cause was a >> conversion from float to integer! It supports a "feeling" I've had for a >> long time that coding in float is dangerous for just such reasons. >> -- >> Randy Yates >> Digital Signal Labs >> http://www.digitalsignallabs.com > > I thought they used ADA for such things >
As far as I recall they used Ada but turned all range checks of Ada off which makes the usage of Ada as an argument for increased safety quite meaningless. "A fool with a tool is still a fool"
rickman <gnuarm@gmail.com> writes:

> On 9/1/2015 7:36 AM, Randy Yates wrote: >> Folks, >> >> I've been in a LinkedIn discussion in which the following analysis an >> Ariane 5 failure is documented: >> >> http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html >> >> I'm just reposting it here since I find it fascinating, and I bet there >> are a few folks here (Tim, you come to mind especially) who might have a >> few things to say about it. >> >> I find it almost laughable (if it weren't for the expense and danger >> such a failure had or potentially had) that the root cause was a >> conversion from float to integer! It supports a "feeling" I've had for a >> long time that coding in float is dangerous for just such reasons. > > I find this conclusion to show an immense lack of understanding of the > cause of the failure. Did we read the same report? > > The use of integers for the variable that was the float would not have > mitigated the accident. If you had used an N bit integer the same > conversion to a 16 bit integer would have resulted in the same > overflow and conversion error. > > The two primary causes of the accident were allowing the software for > alignment of the strap-down inertial platform to continue to run after > liftoff when it received invalid inputs which resulted in the out of > range problem and the decision to shut down the processor on this > error based on the assumption that the software was not faulty but > rather the hardware was, which was an erroneous assumption in this > case.
I think there are several places one could lay the "cause" (perhaps "root cause" was too extreme of a term). I certainly won't argue that one would be the decision to leave the calibration running after it was no longer required. That just doesn't make sense. Another could be the generic exception-handling specification that all exceptions were catastrophic and should result in the processor being shut down. Yet another could be the designers' decision to allow this to generate an exception at all and not test for it and take other non-exceptional action. That is essentially my argument. Let me ask a question: what if the alignment algorithm designer had used ONLY a 16-bit integer for the horizontal bias. Then, AT DESIGN TIME, the algorithm designer would have been forced to consider out-of-range input and choose the action more intelligently. For example, instead of shutting the software down, they could have saturated the value. Granted this could have been done with the double value as well, but the point is that designer is FORCED to consider the case if you are thinking with an integer frame-of-mind. If a saturation had been used, then we wouldn't be talking about exceptions in this report as it would have never happened. Ta-may-toe, ta-ma-toe.. -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
rickman <gnuarm@gmail.com> writes:

> On 9/1/2015 10:59 AM, Tim Wescott wrote: >> >> Now, if I'm going to bring MY prejudices to bear on this, it was because >> the systems engineering team was of the opinion that embedded software is >> Black Magic, or they considered that it doesn't really have value because >> it doesn't show up as a line item on the bill of materials. > > Prejudice is exactly the right word.
Call it what you want - if a different approach had been made, as I outlined in a post just a few minutes ago, the Europeans would be millions of dollars and a missile launch up. -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
spope33@speedymail.org (Steve Pope) writes:

> Randy Yates <yates@digitalsignallabs.com> wrote: > >>Let me ask a question: what if the alignment algorithm designer had used >>ONLY a 16-bit integer for the horizontal bias. Then, AT DESIGN TIME, the >>algorithm designer would have been forced to consider out-of-range input >>and choose the action more intelligently. For example, instead of >>shutting the software down, they could have saturated the value. Granted >>this could have been done with the double value as well, but the point >>is that designer is FORCED to consider the case if you are thinking with >>an integer frame-of-mind. > > This is why you want to use fixed-point types. > > (Which is not the same as storing a value in an integer type. Very > often fixed-point types are stored in doubles.)
Steve, Does ADA have fixed-point types? If so, I agree violently. (I'm not up on ADA...) -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
Randy Yates  <yates@digitalsignallabs.com> wrote:

>spope33@speedymail.org (Steve Pope) writes:
>> Randy Yates <yates@digitalsignallabs.com> wrote:
>>>Let me ask a question: what if the alignment algorithm designer had used >>>ONLY a 16-bit integer for the horizontal bias. Then, AT DESIGN TIME, the >>>algorithm designer would have been forced to consider out-of-range input >>>and choose the action more intelligently. For example, instead of >>>shutting the software down, they could have saturated the value. Granted >>>this could have been done with the double value as well, but the point >>>is that designer is FORCED to consider the case if you are thinking with >>>an integer frame-of-mind.
>> This is why you want to use fixed-point types.
>> (Which is not the same as storing a value in an integer type. Very >> often fixed-point types are stored in doubles.)
>Does ADA have fixed-point types? If so, I agree violently. (I'm not up >on ADA...)
I don't recall ADA having built-in fixed-point types from when I studied ADA in fall, 1980. Almost certainly someone added them to the language at some point. Whether they occured in a particular ADA implementation ... for a particular rocket's computer ... who knows, but "casting a float to an int" sonds like if so, they were not being used. Any reasonably extensible language (ADA, C++) can be extended with fixed-point types. To add the most useful System C fixed point types (and saturation / rounding modes) to C++ is about 50 lines of header file code maximum. It may not be efficient enough for embedded work, however. Steve
Tim Wescott <tim@seemywebsite.com> writes:

> On Tue, 01 Sep 2015 07:36:01 -0400, Randy Yates wrote: > >> Folks, >> >> I've been in a LinkedIn discussion in which the following analysis an >> Ariane 5 failure is documented: >> >> http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html >> >> I'm just reposting it here since I find it fascinating, and I bet there >> are a few folks here (Tim, you come to mind especially) who might have a >> few things to say about it. >> >> I find it almost laughable (if it weren't for the expense and danger >> such a failure had or potentially had) that the root cause was a >> conversion from float to integer! It supports a "feeling" I've had for a >> long time that coding in float is dangerous for just such reasons. > > Well, I don't see that as the biggest error, or even one that, given the > nature of the root problem, would have saved the thing if it was > corrected.
Why not? If the BH conversion was protected as other variables, or an integer was used that saturated, there would have been no exception generated and thus no crash (due to this bug). -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
spope33@speedymail.org (Steve Pope) writes:

> Randy Yates <yates@digitalsignallabs.com> wrote: > >>I find it almost laughable (if it weren't for the expense and danger >>such a failure had or potentially had) that the root cause was a >>conversion from float to integer! It supports a "feeling" I've had for a >>long time that coding in float is dangerous for just such reasons. > > This puts you with Von Neumann.
I'm not sure if that's a complement or a criticism...
> Floats and doubles are not dangerous. They can store integers within a > certain range, just like any other format.
Yes, but when humans use them, they start being sloppy! And if you are not sloppy, you might as well use integers/fixed-point (for many many things).
> Programmers however are dangerous.
This sounds a lot like the anti-gun-control sentiment.. -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
Randy Yates <yates@digitalsignallabs.com> writes:
> [...] > I'm not sure if that's a complement or a criticism...
compliment -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
On Tue, 01 Sep 2015 18:33:36 -0400, Randy Yates wrote:

> Tim Wescott <tim@seemywebsite.com> writes: > >> On Tue, 01 Sep 2015 07:36:01 -0400, Randy Yates wrote: >> >>> Folks, >>> >>> I've been in a LinkedIn discussion in which the following analysis an >>> Ariane 5 failure is documented: >>> >>> http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html >>> >>> I'm just reposting it here since I find it fascinating, and I bet >>> there are a few folks here (Tim, you come to mind especially) who >>> might have a few things to say about it. >>> >>> I find it almost laughable (if it weren't for the expense and danger >>> such a failure had or potentially had) that the root cause was a >>> conversion from float to integer! It supports a "feeling" I've had for >>> a long time that coding in float is dangerous for just such reasons. >> >> Well, I don't see that as the biggest error, or even one that, given >> the nature of the root problem, would have saved the thing if it was >> corrected. > > Why not? If the BH conversion was protected as other variables, or an > integer was used that saturated, there would have been no exception > generated and thus no crash (due to this bug).
I think that saying that the problem was that they used floating point is like saying "he didn't apply the brakes early enough" about a guy who went driving on wet roads with bald tires. Yes, it's _a_ correct interpretation of the evidence. But I don't think it's the _most useful_ interpretation. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On 2015-09-01 14:59:17 +0000, Tim Wescott said:

> On Tue, 01 Sep 2015 07:36:01 -0400, Randy Yates wrote: > >> Folks, >> >> I've been in a LinkedIn discussion in which the following analysis an >> Ariane 5 failure is documented: >> >> http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html >> >> I'm just reposting it here since I find it fascinating, and I bet there >> are a few folks here (Tim, you come to mind especially) who might have a >> few things to say about it. >> >> I find it almost laughable (if it weren't for the expense and danger >> such a failure had or potentially had) that the root cause was a >> conversion from float to integer! It supports a "feeling" I've had for a >> long time that coding in float is dangerous for just such reasons. > > Well, I don't see that as the biggest error, or even one that, given the > nature of the root problem, would have saved the thing if it was > corrected. > > The problem, as I see it, is that when they wrote the software for the > Ariane 4 they were a bit sloppy (in the floating-to-integer conversion). > Then, when they decided to reuse the software in the Ariane 5 they did > not fully consider the impact of the change in the flight trajectory -- > i.e., they were sloppy. Then, they didn't fully test the software -- > i.e., they were sloppy.
The story I seem to recall included the facts that the software worked on the A4 but the A5 was a higher performance vehicle which then caused the overflows, etc, etc as you recount.
> So they basically crashed an entire rocket system because they were > sloppy. > > Now, if I'm going to bring MY prejudices to bear on this, it was because > the systems engineering team was of the opinion that embedded software is > Black Magic, or they considered that it doesn't really have value because > it doesn't show up as a line item on the bill of materials.