Fault-tolerant parallel applications using a network of workstations

Smith, James Antony

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/2085

Full metadata record

DC Field	Value	Language
dc.contributor.author	Smith, James Antony	-
dc.date.accessioned	2014-02-26T14:35:55Z	-
dc.date.available	2014-02-26T14:35:55Z	-
dc.date.issued	1997	-
dc.identifier.uri	http://hdl.handle.net/10443/2085	-
dc.description	PhD thesis	en_US
dc.description.abstract	It is becoming common to employ a Network Of Workstations, often referred to as a NOW, for general purpose computing since the allocation of an individual workstation offers good interactive response. However, there may still be a need to perform very large scale computations which exceed the resources of a single workstation. It may be that the amount of processing implies an inconveniently long duration or that the data manipulated exceeds available storage. One possibility is to employ a more powerful single machine for such computations. However, there is growing interest in seeking a cheaper alternative by harnessing the significant idle time often observed in a NOW and also possibly employing a number of workstations in parallel on a single problem. Parallelisation permits use of the combined memories of all participating workstations, but also introduces a need for communication. and success in any hardware environment depends on the amount of communication relative to the amount of computation required. In the context of a NOW, much success is reported with applications which have low communication requirements relative to computation requirements. Here it is claimed that there is reason for investigation into the use of a NOW for parallel execution of computations which are demanding in storage, potentially even exceeding the sum of memory in all available workstations. Another consideration is that where a computation is of sufficient scale, some provision for tolerating partial failures may be desirable. However, generic support for storage management and fault-tolerance in computations of this scale for a NOW is not currently available and the suitability of a NOW for solving such computations has not been investigated to any large extent. The work described here is concerned with these issues. The approach employed is to make use of an existing distributed system which supports nested atomic actions (atomic transactions) to structure fault-tolerant computations with persistent objects. This system is used to develop a fault-tolerant "bag of tasks" computation model, where the bag and shared objects are located on secondary storage. In order to understand the factors that affect the performance of large parallel computations on a NOW, a number of specific applications are developed. The performance of these applications is ana- lysed using a semi-empirical model. The same measurements underlying these performance predictions may be employed in estimation of the performance of alternative application structures. Using services provided by the distributed system referred to above, each application is implemented. The implement- ation allows verification of predicted performance and also permits identification of issues regarding construction of components required to support the chosen application structuring technique. The work demonstrates that a NOW certainly offers some potential for gain through parallelisation and that for large grain computations, the cost of implementing fault tolerance is low.	en_US
dc.description.sponsorship	Engineering and Physical Sciences Research Council	en_US
dc.language.iso	en	en_US
dc.publisher	Newcastle University	en_US
dc.title	Fault-tolerant parallel applications using a network of workstations	en_US
dc.type	Thesis	en_US
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
Smith J. 1997.pdf	Thesis	16.72 MB	Adobe PDF	View/Open
dspacelicence.pdf	Licence	43.82 kB	Adobe PDF	View/Open

Show simple item record