DSPRelated.com
Forums

OT: Ariane 5 Launcher Failure

Started by Randy Yates September 1, 2015
On 9/2/2015 10:23 AM, Randy Yates wrote:
> rickman <gnuarm@gmail.com> writes: >> [...] >> Your idea that using integers throughout would have shown the problem >> is specious because it would have had the exact same thought process >> throughout. > > What competent programmer uses an integer without knowing and > considering its range and its appropriateness to the task at-hand?
In mission critical apps, the same is true for floating point. If I were designing a signal generator I would not spend so much time or effort on the numerical issues. But I can't see how anyone designing such a critical system as we are discussing just saying, "this is floating point, I don't need to think". That seems to be what you are implying and that is what I reject.
> I'm not sure what your point is, Rick. It could be this: they KNEW the > variable would go out-of-range if one or more of the inputs were outside > a certain range, and they chose to consider that a catastrophic error. > > Yes, if you knew clearly this was the case, I guess you could say the > use of floating or integer had nothing to do with it.
So what is your point? Sounds like you are agreeing with me now. -- Rick
rickman  <gnuarm@gmail.com> wrote:

>On 9/2/2015 10:23 AM, Randy Yates wrote:
>> rickman <gnuarm@gmail.com> writes: >>> [...] >>> Your idea that using integers throughout would have shown the problem >>> is specious because it would have had the exact same thought process >>> throughout.
>> What competent programmer uses an integer without knowing and >> considering its range and its appropriateness to the task at-hand?
>In mission critical apps, the same is true for floating point. If I >were designing a signal generator I would not spend so much time or >effort on the numerical issues. But I can't see how anyone designing >such a critical system as we are discussing just saying, "this is >floating point, I don't need to think".
I'm not a rocket scientist, but of the top of my head I will say you don't want *any* code to be able to enter an exception handler during launch. Before launch; once in orbit; sure. Maybe after you've fired your last stage and are approaching orbit. But in the early part of a launch you need to go with what you have, not throw an exception. Steve
On 9/1/2015 11:53 PM, Randy Yates wrote:
> rickman <gnuarm@gmail.com> writes: > >> On 9/1/2015 6:24 PM, Randy Yates wrote: >>> rickman <gnuarm@gmail.com> writes: >>> >>>> On 9/1/2015 10:59 AM, Tim Wescott wrote: >>>>> >>>>> Now, if I'm going to bring MY prejudices to bear on this, it was because >>>>> the systems engineering team was of the opinion that embedded software is >>>>> Black Magic, or they considered that it doesn't really have value because >>>>> it doesn't show up as a line item on the bill of materials. >>>> >>>> Prejudice is exactly the right word. >>> >>> Call it what you want - if a different approach had been made, as I >>> outlined in a post just a few minutes ago, the Europeans would be >>> millions of dollars and a missile launch up. >> >> Lol. That is quite a stretch... > > How so? If the integer overflow issue had been dealt with at design > time, none of the rest of the bad decisions (e.g., leaving the > calibration code running after launch) would have mattered.
Of course. But I was responding to your criticism of using floating point which is not the issue. The issue is the reduction of data range combined with all the other issues. That is why I said that if they had done the app in integer 100% they would likely still have the same problem when the result was truncated to a 16 bit integer. I don't see any reason why converting from high bit width integer would work any better or be designed any better than converting from floating point. Some analysis said the ultimate result would fit a 16 bit integer (with inputs of a given range). That analysis was faulty because of the new operating environment along with several other issues that allowed this to turn into a broken rocket. I didn't see this mentioned, but given the new operating conditions every part of the algorithm should have been re-qualified to the new conditions and that clearly wasn't done properly. -- Rick
On 9/2/2015 5:14 PM, Steve Pope wrote:
> rickman <gnuarm@gmail.com> wrote: > >> On 9/2/2015 10:23 AM, Randy Yates wrote: > >>> rickman <gnuarm@gmail.com> writes: >>>> [...] >>>> Your idea that using integers throughout would have shown the problem >>>> is specious because it would have had the exact same thought process >>>> throughout. > >>> What competent programmer uses an integer without knowing and >>> considering its range and its appropriateness to the task at-hand? > >> In mission critical apps, the same is true for floating point. If I >> were designing a signal generator I would not spend so much time or >> effort on the numerical issues. But I can't see how anyone designing >> such a critical system as we are discussing just saying, "this is >> floating point, I don't need to think". > > I'm not a rocket scientist, but of the top of my head I will say you > don't want *any* code to be able to enter an exception handler > during launch. Before launch; once in orbit; sure. Maybe > after you've fired your last stage and are approaching orbit. But in > the early part of a launch you need to go with what you have, > not throw an exception.
I was working on a system using Transputers many years ago. They had hardware range checking which would throw an exception if a register over/underflowed during a math operation. I asked why this was a good thing and it was pointed out that flying without knowing you had bad data would be a *very* bad thing. Better to shut down and let another system take over. In this case the other system had already shut down for the same reason since they were running identical code. The assumption here was that a fault of this sort would be hardware related and so unlikely that the redundant system would also be faulty. Shutting down the "bad" system was necessary to allow the other system to take over I assume. Shutting down the second processor is clearly not a good idea, lol. What would the rocket have done if the exception did not shut down the processor? Would it have flown properly? Or would it have possibly flown far enough off course to cause other problems like crashing on people? I expect they had other means of preventing that. -- Rick
On 9/2/15 12:24 AM, Randy Yates wrote:
> Les Cargill<lcargill99@comcast.com> writes: > >> Randy Yates wrote: >>> spope33@speedymail.org (Steve Pope) writes: >>> >>>> Randy Yates<yates@digitalsignallabs.com> wrote: >>>> >>>>> I find it almost laughable (if it weren't for the expense and danger >>>>> such a failure had or potentially had) that the root cause was a >>>>> conversion from float to integer! It supports a "feeling" I've had for a >>>>> long time that coding in float is dangerous for just such reasons. >>>> >>>> This puts you with Von Neumann. >>> >>> I'm not sure if that's a complement or a criticism... >>> >>>> Floats and doubles are not dangerous. They can store integers within a >>>> certain range, just like any other format. >>> >>> Yes, but when humans use them, they start being sloppy! >> >> Oh no no no!. You cannot trust them. Although really - integer >> saturation is just as dangerous and probably more common. >> >>> And if you are >>> not sloppy, you might as well use integers/fixed-point (for many many >>> things).
to paraphrase (or as intimated in) https://xkcd.com/163/ "Different tasks call for different [types]" there are many places where i'm convinced a floating-point (or mixed) environment is most appropriate. but, it's worse than "sloppy" to make a moving-average or CIC filter using floats. you gotta make sure that what you add to the accumulator is exactly subtracted from it later. can't do that with float.
>>>> Programmers however are dangerous. >>> >>> This sounds a lot like the anti-gun-control sentiment.. >>> >> >> "Floating point doesn't kill rockets... programmers >> kill rockets... " >
woot!!
> Exactly!!! True enough...
(i disagree with the original anti-gun-control canard, but i love the comparison.) -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
On Wed, 2 Sep 2015 21:14:18 +0000 (UTC), spope33@speedymail.org (Steve
Pope) wrote:

>rickman <gnuarm@gmail.com> wrote: > >>On 9/2/2015 10:23 AM, Randy Yates wrote: > >>> rickman <gnuarm@gmail.com> writes: >>>> [...] >>>> Your idea that using integers throughout would have shown the problem >>>> is specious because it would have had the exact same thought process >>>> throughout. > >>> What competent programmer uses an integer without knowing and >>> considering its range and its appropriateness to the task at-hand? > >>In mission critical apps, the same is true for floating point. If I >>were designing a signal generator I would not spend so much time or >>effort on the numerical issues. But I can't see how anyone designing >>such a critical system as we are discussing just saying, "this is >>floating point, I don't need to think". > >I'm not a rocket scientist, but of the top of my head I will say you >don't want *any* code to be able to enter an exception handler >during launch. Before launch; once in orbit; sure. Maybe >after you've fired your last stage and are approaching orbit. But in >the early part of a launch you need to go with what you have, >not throw an exception. > >Steve
Margaret Hamilton disagrees. The very beginning of "engineered" software included exception handling that essentially saved the first Apollo Moon landing: https://boingboing.net/2015/05/07/photo-celebrates-unsung-nasa-s.html Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
Eric Jacobsen <eric.jacobsen@ieee.org> wrote:

>On Wed, 2 Sep 2015 21:14:18 +0000 (UTC), spope33@speedymail.org (Steve
>>I'm not a rocket scientist, but of the top of my head I will say you >>don't want *any* code to be able to enter an exception handler >>during launch. Before launch; once in orbit; sure. Maybe >>after you've fired your last stage and are approaching orbit. But in >>the early part of a launch you need to go with what you have, >>not throw an exception.
>Margaret Hamilton disagrees. The very beginning of "engineered" >software included exception handling that essentially saved the first >Apollo Moon landing: > >https://boingboing.net/2015/05/07/photo-celebrates-unsung-nasa-s.html
To me, the above link does not describe exception handling. Certainly not arithmetic exceptions. It seems to mostly decribe operating system design. Steve
In article <ms7r1k$tqe$1@dont-email.me>, rickman  <gnuarm@gmail.com> wrote:

>On 9/2/2015 5:14 PM, Steve Pope wrote:
>> I'm not a rocket scientist, but of the top of my head I will say you >> don't want *any* code to be able to enter an exception handler >> during launch. Before launch; once in orbit; sure. Maybe >> after you've fired your last stage and are approaching orbit. But in >> the early part of a launch you need to go with what you have, >> not throw an exception.
>I was working on a system using Transputers many years ago. They had >hardware range checking which would throw an exception if a register >over/underflowed during a math operation.
Almost all CPU's can do this.
>I asked why this was a good >thing and it was pointed out that flying without knowing you had bad >data would be a *very* bad thing. Better to shut down and let another >system take over. In this case the other system had already shut down >for the same reason since they were running identical code. The >assumption here was that a fault of this sort would be hardware related >and so unlikely that the redundant system would also be faulty. >Shutting down the "bad" system was necessary to allow the other system >to take over I assume. Shutting down the second processor is clearly >not a good idea, lol.
>What would the rocket have done if the exception did not shut down the >processor?
As someone suggested above, it could have been coded to saturate the value into the integer type, and depending on details this could have resulted in the system continuing to operate. (But one other important design task they clearly did not do is regress the code against all possible input conditions... that would have caught the unexpected exception.)
>Would it have flown properly? Or would it have possibly >flown far enough off course to cause other problems like crashing on >people? I expect they had other means of preventing that.
The range control officer can always prevent that -- I'm pretty sure the destruct command is not processed by software. Steve
Steve Pope <spope33@speedymail.org> wrote:
> Randy Yates <yates@digitalsignallabs.com> wrote:
>>Let me ask a question: what if the alignment algorithm designer had used >>ONLY a 16-bit integer for the horizontal bias. Then, AT DESIGN TIME, the >>algorithm designer would have been forced to consider out-of-range input >>and choose the action more intelligently.
(snip)
> This is why you want to use fixed-point types.
> (Which is not the same as storing a value in an integer type. Very > often fixed-point types are stored in doubles.)
The use of fixed point non-integer data types seems to be a lost art. Not so long after I started computer programming, I was doing PL/I programming. (I mostly don't now, as the compilers are harder to find.) PL/I has scaled fixed point in both binary and decimal. While doing scaled decimal fixed point on hardware with BCD arithmetic seems somewhat obvious, it can also be done in binary. One IBM PL/I implementation (from CALL/OS) that I used many years ago did FIXED DECIMAL in binary. It does tend to require some multiply and divide by powers of 10 that BCD doesn't require. Many tasks, DSP being one, are best done in fixed point, but the usual languages don't help much at all. Floating point is nice for quantities with relative uncertainty, but fixed point is better with absolute uncertainties. Some examples given by Knuth are finance and typesetting. We learn fixed point math in 3rd or 4th grade, and then forget how to do it later. (The rules for binary and decimal are the same, if you use 'binary point' instead of 'decimal point'.) Quick test: if you have two ratios, A:B and C:D, how do you find out if they are equal? -- glen
Eric Jacobsen <eric.jacobsen@ieee.org> wrote:
> On Wed, 02 Sep 2015 16:11:01 -0400, robert bristow-johnson
(snip on ADA)
>>i was told that ADA was meant to "become all things to all men." (a >>biblical reference for those who might not recognize it.)
> I never heard it described that way, but it was originally developed > to force a bit more discipline in many error-prone errors in order to > increase code reliability, traceability, readability, etc., etc. It > has since fallen out of favor a bit after a few decades of > demonstrating that bad coders will write crap no matter what language > you hand them.
But its decendant, VHDL, is still with us. Personally, I like verilog more, but after learning a few rules, writing structural verilog in VHDL isn't that hard. There are a few convenient operations that were added to VHDL so recently that most parsers don't know them yet.
> It is still used in some areas, partly for legacy purposes. It used > to be required on a lot of military projects, but not so much any > more.
-- glen