Please use this identifier to cite or link to this item:
http://theses.ncl.ac.uk/jspui/handle/10443/3301
Title: | Antares :a scalable, efficient platform for stream, historic, combined and geospatial querying |
Authors: | Simmonds, Rebecca Maria |
Issue Date: | 2016 |
Publisher: | Newcastle University |
Abstract: | Traditional methods for storing and analysing data are proving inadequate for processing \Big Data". This is due to its volume, and the rate at which it is being generated. The limitations of current technologies are further exacerbated by the increased demand for applications which allow users to access and interact with data as soon as it is generated. Near real-time analysis such as this can be partially supported by stream processing systems, however they currently lack the ability to store data for e cient historic processing: many applications require a combination of near real-time and historic data analysis. This thesis investigates this problem, and describes and evaluates a novel approach for addressing it. Antares is a layered framework that has been designed to exploit and extend the scalability of NoSQL databases to support low latency querying and high throughput rates for both stream and historic data analysis simultaneously. Antares began as a company funded project, sponsored by Red Hat the motivation was to identify a new technology which could provide scalable analysis of data, both stream and historic. The motivation for this was to explore new methods for supporting scale and e ciency, for example a layered approach. A layered approach would exploit the scale of historic stores and the speed of in-memory processing. New technologies were investigates to identify current mechanisms and suggest a means of improvement. Antares supports a layered approach for analysis, the motivation for the platform was to provide scalable, low latency querying of Twitter data for other researchers to help automate analysis. Antares needed to provide temporal and spatial analysis of Twitter data using the timestamp and geotag. The approach used Twitter as a use case and derived requirements from social scientists for a broader research project called Tweet My Street. Many data streaming applications have a location-based aspect, using geospatial data to enhance the functionality they provide. However geospatial data is inherently di - cult to process at scale due to its multidimensional nature. To address these di culties, - i - this thesis proposes Antares as a new solution to providing scalable and e cient mechanisms for querying geospatial data. The thesis describes the design of Antares and evaluates its performance on a range of scenarios taken from a real social media analytics application. The results show signi cant performance gains when compared to existing approaches, for particular types of analysis. The approach is evaluated by executing experiments across Antares and similar systems to show the improved results. Antares demonstrates a layered approach can be used to improve performance for inserts and searches as well as increasing the ingestion rate of the system. |
Description: | PhD Thesis |
URI: | http://hdl.handle.net/10443/3301 |
Appears in Collections: | School of Computing Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Simmonds, R 2016.pdf | Thesis | 7.91 MB | Adobe PDF | View/Open |
dspacelicence.pdf | Licence | 43.82 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.