SE351: Lecture 3: notfailing

fluke in notfailing

SE351: Lecture 3

Mar 15, 2010 16:29

   TOPIC: Critical System Development

☼ Summary:
   ► Dependability requires fault avoidance, detection and tolerance
   ► Diversity & redundancy are two techniques used to attain a dependable system. You see it around the place!
      ► i.e: N-version programming, multichannel computation, multi-version programming, protection systems...
      ► Well all of those are more examples of redundancy... I think redundancy kind of implies diversity most of the time though, since if you just do the same thing twice it's just stupid...
   ► For most things you'll have to weigh the benefits of the technique with the cost
   ► Dependable processes (5 qualities) (5 steps to validating a dependable system)
   ► Dependable system architecture (components, internal safety devices, external safety devices - protection systems, self monitoring architectures, n-version programming)
   ► Dependable programming (8 steps)

Dependability

☼ Dependability requirements
   ► In general, customers expect all software to be dependable, but depends on the criticality/importance of the application
   ► Some apps have very high dependability reqs and require special techniques (i.e. medical systems, telecommunications...)

☼ Achieving dependability
   ► Fault avoidance (avoid human error => minimise faults)
   ► Fault detection (detect any faults before it goes out)
   ► Fault tolerance (design system so faults don't result in failure)

☼ Stupid graph on slide 6

☼ Regulated systems are those that have to be approved by a regulator
   ► i.e. Nuclear systems, air traffic control systems, medical devices
   ► Devs need to produce evidence that the system is safe & dependable

☼ Diversity & redundancy
   ► Redundancy: More than one critical component available (i.e. backups)
      ► Where availability is critical there are normally backup servers
   ► Diversity: Provide the same thing in different ways so they don't fail in the same way
      ► Protects against external attacks i.e. different OSes for different servers
      ► If your backup is exactly the same it may fail in exactly the same way
   ► Valid ways of making it more dependable, but introduces complexity, and thus chances of error
   ► Simplicity + V&V may be a better option depending on the system

☼ Process diversity & redundancy
   ► Activities such as validation shouldn't depend on a single approach (i.e. testing) to validate the entire system.
   ► Using varied process activities to cross-check and complement each other helps avoid process errors.

Dependable processes

☼ Dependable processes (well-defined, repeatable) minimise software faults and produces consistent results
   ► Should not depend on individual skills
   ► Regulators use the process to check if good SE practice has been used
   ► Good processes include significant V&V
   ► Should be:
      ► Documented: sets out activities and required documentation
      ► Standardised: Standards that define how the software is to be produced and documented... Isn't this part of documentation?
      ► Auditable: Understandable by other people, so they can check and provide feedback on the process
      ► Diverse: Include redundant and diverse V&V
      ► Robust: Can recover from failures of individual process activities

☼ Validation activities
   ► Requirement inspection: check for possible issues based on past experience - uses a checklist)
   ► Requirements management: change management - document any requirements that may change or depend on other requirements)
   ► Model checking: check the model and consistency of the model used
   ► Design & code inspection: similar to req inspection - use a checklist and some exp. people to check for potential problems. note that inspectors aren't responsible for actually fixing the code.
   ► Static analysis: checking for errors? from the sounds of it i think it's just inspecting the code...
   ► Test planning & management
   ► Configuration management: managing all the versions and revisions

☼ Fault tolerance describes the ability of the system to continue operating despite software failure
   ► This is required for systems that need a high level of availability (since problems are inevitable)
   ► Even in technically correct software, there may be spec errors or incorrect validation o/ So yeah shit happens

Dependable system architecture

☼ Dependable system architecture generally relies on redundancy and diversity

☼ Component reliability & quality offsets chance of component failure. Use redundant hardware & software components!
   ► i.e. sensors or heater in a water heater

☼ Internal safety & warning devices is the next line of defence! Helps prevent user errors or design flaws.
   ► i.e. high-temp limit switch (interrupts power when temperature goes too high)
   ► i.e. safety software detects failure and stops power

☼ External safety devices is the last line of defence! Like... Physical containment. And stuff.
   ► i.e. temperature safety valve (what is this even)
   ► i.e. pressure safety valve

☼ Rob apparently communally, platonically spooned with 6 people

☼ Protection systems are specialised systems that take emergency action when rocks fall - independently monitors the system & environment and shuts it down if something weird happens.
   ► Should have low probability of failure on demand
   ► Redundant - often has stuff that's included in the software that it's monitoring
   ► Diverse - uses different software
   ► Simpler, so it doesn't require too much effort

☼ Self monitoring architectures
   ► MONKEY CHANNEL
   ► same computation on two different channels (aka multichannels) (diverse & redundant) - results are compared, if different, then a failure is assumed
   ► Hardware and software should both be diverse otherwise it's kinda stupid

☼ N-version programming
   ► Multiple versions of a software system do things at the same time (should be odd)
   ► Voting system!
   ► Most faults result from component failures rather than design faults, and there's a low chance of simultaneous component failure.

► Apparently culture determines how you code software so make sure you have a CULTURALLY DIVERSE team

Dependable programming

☼ Specification dependency describes how software is dependent on the specs to be correct
   ► So: develop separate software specs from the same user specs (multi-version programming)
   ► Key question is whether or not the benefit is worth the cost

☼ Dependable programming reduces the incidence of program faults and support fault avoidance, detection and tolerance
   ► Limit visibility of information in a program
      ► Only allow components to access what's needed for implementation - removes chance of data corruption
      ► Can be constrolled via abstract (???) data types with private attributes
   ► Check all inputs
      ► Check range, size, representation (no illegal characters), reasonableness (kinda the same as size/range checks isn't it...)
   ► Provide a handler for all exceptions (fault tolerance)
      ► Can either: cancel the method and return a meaningful error, carry out an alternative method, or pass the problem off to a support system
   ► Minimise error-prone constructs
      ► Some stuff in programming is just dodgy
      ► Goto statements, breaks, interrupts
      ► Floats are imprecise
      ► Pointers can corrupt data
      ► Dynamic memory allocation can cause memory overflow
      ► Recursion can cause memory overflow
      ► Inheritance implies that the code's reliant on stuff that you may not have the source for
      ► Unbounded arrows - buffer overflow failures...
      ► Default input processing may cause unintended results
   ► Provide restart capabilities
      ► i.e. regular saved states, so users don't lose everything if the system crashes
   ► Check array bounds
      ► Some languages (see: C) allow you to access memory outside of the bounds of an array by going over the top element
      ► Can cause security breaches
   ► Include timeouts when calling external components
      ► Assume failure if something takes too long to respond
   ► Name all constants that represent real-world values
      ► Easier to make changes - if the value changes you only have to change one line of the program

se351