Please use this identifier to cite or link to this item:
Title: The effects of size on the function of an information retrieval document collection
Authors: Mushens, Brian G.
Issue Date: 1982
Publisher: Newcastle University
Abstract: A feature of research into Information Betrieval has been the continued use of small test collections in experiments. The assumption that any results will remain valid when the system is used to interrogate a large operational database is examined critically particulaIly with regard to the difference in size of collections involved and the reasons for this. Experiments investigatinsg MEDLARS database with reference to several sub-collections containing varying numbers of documents are described. These include analyses of single term and two-term combination behaviour and actual retrieval searches. The effect cn the clustering structure of diffeIent small sub-collections is also studied. The results ottained for MEDLARS are examined in the context of some well-known test collections, namely Cranfield 2 and INSEC. Results for MEDLARS data indicate that very large collecticns ( > 20,000 documents) may be necessary in order to ensure that the experimental data is indeed representative and may therefore be used to accurately predict the performance of a particular system in the operational ervironment.
Description: Phd Thesis
Appears in Collections:School of Computing Science

Files in This Item:
File Description SizeFormat 
Mushens, B. 1982.pdfThesis12.13 MBAdobe PDFView/Open
dspacelicence.pdfLicence43.82 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.