Please use this identifier to cite or link to this item:
http://theses.ncl.ac.uk/jspui/handle/10443/5920
Title: | Methodology for machine learning-based micro-event detection in non-stationary multidimensional temporal data |
Authors: | Sokolovsky, Artur |
Issue Date: | 2022 |
Publisher: | Newcastle University |
Abstract: | The last two decades led to a significant increase in data availability. This is especially true for online communities. Namely, we have been seeing a significant activity and content amount increase within social networks, forum communities and electronic marketplaces. All these can be seen as platforms for online community interactions. Moreover, these platforms contain all the history of the interactions, creating a large footprint and allowing us to study more subtle structures within the data. While 20 years ago researchers only started thinking about topic detection and tracking in textual data sources (mostly limited by news feeds), now we have a huge variety of data feeds including Twitter, Stack Overflow, and Reddit, to name the most popular ones. Furthermore, these data sources contain hundreds of gigabytes of structured and free text data, that can be easily accessed and analysed. For instance, over the last decade, many successful attempts were made to detect various event types in Twitter data in an automatic way. These included natural disasters, local concerts, celebrity-related events, collective events like pandemics, and so on. The more data we get access to, the more ambitious goals we can set. There are many activities in online communities that are not widely advertised. This is especially true for hacker forums, as well as forums dedicated to other illegal activities. Considering the growth of these communities, it becomes essential to perform automated analysis and risk assessment of such platforms and their trends. The current work lays the basis for achieving this goal by introducing the notion of micro event, as an event not detectable for a single data record. For instance, if there is a single tweet, one cannot judge whether it is related or not related to the micro-event. However, in the context of other tweets, it might be treated as related to the micro-event. This subtle nature of micro-events makes them incredibly hard to reliably detect in an automatic way. Hence, it is essential to design a generalisable methodology that would allow reliability, reproducibility and comparability when studying micro-events. In the work, I propose a definition of micro-events, as well as a generalisable methodol ogy for their discovery and classification. In the work, I discover that the definition, as well as the methodology, are suitable not only for textual communications but also generalisable to time series data. Since it is not feasible to get the labelled data for the above-mentioned use cases, I design datasets and experiments to mimic the described settings as closely as possible. Firstly, I apply the proposed methodology to detect FLOSS (Python packages) version release events in Stack Overflow data. The version releases are not explicitly mentioned or advertised in the data source, however, the events impact the community by introducing and deprecating packages’ functionality. This makes the proposed experiment a good example of the micro-event detection task. Secondly, I adapt the proposed methodology to financial time series data, where market patterns align well with the introduced definition of micro events. I introduce a machine learning-tailored market pattern, means for its automatic detection, and prediction of the price action scenarios after the event takes place. |
Description: | PhD Thesis |
URI: | http://hdl.handle.net/10443/5920 |
Appears in Collections: | School of Computing |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
SokolovskyA2022.pdf | Thesis | 5.49 MB | Adobe PDF | View/Open |
dspacelicence.pdf | Licence | 43.82 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.