Please use this identifier to cite or link to this item:
http://theses.ncl.ac.uk/jspui/handle/10443/6762| Title: | Advancing scientific knowledge representation : standardisation and integration in tolerogenic therapies |
| Authors: | Sahar, Ayesha |
| Issue Date: | 2025 |
| Publisher: | Newcastle University |
| Abstract: | In this thesis, we use data integration and analysis methods and examine the impact of data standardisation to enhance our understanding of tolerogenic dendritic cell (tolDC) therapies. Standardisation and structuring of the data are extremely valuable for it to be useful and accessible. Emerging biological fields face unique difficulties, including limited data availability, a lack of standardisation and challenges in knowledge management from different studies due to varied methodologies. These issues demand the development and application of specialised techniques and strategies tailored to their specific data handling and management needs. This thesis focuses on one such emerging field, “tolerogenic Dendritic Cell Therapy”, which has demonstrated significant potential. Like all biomedical experiments, developing these therapies involves several crucial steps that must be well-documented for comparison and replication purposes. Reporting frameworks, like Minimum Information Models can aid in standardising these descriptions; Minimum Information about Tolerogenic AntigenPresenting cells (MITAP) was created in 2016 in this field for this purpose. We evaluate MITAP’s impact on the field of tolDC therapies by analysing a selection of literature. We found that MITAP is utilised in a minority of relevant papers (14%), but where it is applied, there is slightly more metadata available. This suggests that while MITAP has had some success, further efforts are needed for standardised reporting to become widespread in the discipline. In order to further aid the comparison, re-purposing and re-use of data about tolDC therapies, we built a method to identify and integrate the most significant information related to tolerogenic dendritic cell therapies into a knowledge graph structure. A key aspect of the knowledge graph is ensuring that the merged data is relevant to the field. We employ knowledge extraction techniques to identify and collect relevant information from research articles, integrating this with publicly available datasets to enrich the knowledge base. We successfully embedded this data into a comprehensive knowledge graph comprising 120k entities extracted from full-text articles and additional integration of 92k relationships from other relevant databases. The use of knowledge extraction techniques from research articles ensured the relevance of the integrated data to the field. It also allowed us to gain more insights from publications with unpublished experimental data, as shown in the example queries. This knowledge graph can act as a base for the generation of further hypotheses as well as a database for the storage and retrieval of relevant information about tolDC therapies. Having built the knowledge graph our focus shifts to considering queries about the tolDC therapies that give us a better understanding of the degree of standardisation, about the underlying biology and the social environment in the field. We formulated diverse queries encompassing heterogeneity concerns. The results demonstrated the effectiveness of tolKG in promptly addressing these queries, a task that would either necessitate specialised expertise or significant manual scrutiny if pursued conventionally. Through the utilisation of tolKG, we streamline tasks such as comparison and analysis and even facilitate the generation of novel hypotheses. In summary, we found that a knowledge graph is an effective way to integrate data. Moreover, the addition of data from the literature makes it more meaningful, especially for emerging fields where there is a lack of experimental data sharing. Text mining from literature enables the extraction of more relationships that are specific to a field. As a result, it can help to perform an effective analysis and comparison of the tolDC therapy field. Together, this work helps establish the groundwork for applying data science methods in tolDC therapies making several kinds of comparisons possible which are not possible without it. The methodologies employed are specifically tailored to the data sources of tolDC therapies. Nonetheless, these strategies are not restricted to this particular domain; they primarily depend on the input data sources, which makes them usable in other areas of biology as well. |
| Description: | PhD Thesis |
| URI: | http://hdl.handle.net/10443/6762 |
| Appears in Collections: | School of Computing |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Sahar A 2025.pdf | Thesis | 15.99 MB | Adobe PDF | View/Open |
| dspacelicence.pdf | Licence | 43.82 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.