Performance and reliability are two of the most crucial issues in today's high-performance instrumentation and measurement systems. Instrumentation and measurement systems have found and enjoyed their performance enhancement through parallel and distributed processing. High speed and density Multistage Interconnection Networks (MINs) is a widely-used subsystem of parallel processing and communication systems. New performance models are proposed to evaluate the fault tolerant MIN in this paper, thereby establishing a sound foundation for assuring the performance and reliability of fault tolerant MINs with high confidence level during parallel instrumentation. A concurrent fault detection and recovery scheme for MINs is introduced to enable a generic approach to fault tolerance by rerouting over the redundant interconnection links. A switch architecture to realize the concurrent testing and diagnosis is shown. The proposed performance models are developed and used to evaluate the compound effect of the fault tolerant operations such as testing, diagnosis and recovery on the throughput and delay. Results are shown on single transient and permanent stuck-at faults on links and storage units in switching elements. It is shown that the performance degradation for the overhead due to the fault tolerance is quite graceful while the performance degradation without fault recovery is unacceptable.
- Computer System Recovery,
- Data Communication Systems,
- Distributed Computer Systems,
- Failure Analysis,
- Fault Tolerant Computer Systems,
- Instrument Testing,
- Mathematical Models,
- Parallel Processing Systems,
- Performance,
- Telecommunication Links,
- Concurrent Fault Detection,
- Fault Detection,
- Multistage Interconnection Network,
- Parallel Instrumentation,
- Interconnection Networks,
- Diagnosis,
- Distributed Systems,
- Instrumentation,
- Parallel Processing,
- Performance Analysis
Available at: http://works.bepress.com/minsu-choi/83/