Article type
Abstract
Objectives: To design and build an information architecture which facilitates the early identification of research evidence; its rapid curation and classification; synthesis; and its use in guideline development. This work is under development at the National Institute for Health and Care Excellence (NICE), England, the EPPI-Centre, UCL and MAGIC.
Results: A new federated search platform (‘HDAS’) was developed. Users can set up searches across multiple databases (e.g. PubMed, Embase, Cinahl) utilising a bespoke ‘language’ to translate searches from one database provider to another. Searches can be set to run periodically and the references downloaded and deduplicated against a master 'index' database of studies. References will then be downloaded into EPPI-Reviewer and classified using machine learning according to which guideline domain they ‘belong’ to. The full text of references with high probability of relevance can be automatically identified and retrieved. Automated data extraction of key concepts and structured data from tables takes place. References are scanned by human users and incorporated into syntheses. The results are published as web services and consumed by the MAGICapp platform and made available for guideline developers. Core technologies include the use of semantic web and appropriate ontologies and controlled vocabularies to facilitate the effective sharing and re-use of data.
Conclusions: This pilot system architecture demonstrates the utility of emerging technologies to greatly enhance the efficiency of research surveillance and use.
Results: A new federated search platform (‘HDAS’) was developed. Users can set up searches across multiple databases (e.g. PubMed, Embase, Cinahl) utilising a bespoke ‘language’ to translate searches from one database provider to another. Searches can be set to run periodically and the references downloaded and deduplicated against a master 'index' database of studies. References will then be downloaded into EPPI-Reviewer and classified using machine learning according to which guideline domain they ‘belong’ to. The full text of references with high probability of relevance can be automatically identified and retrieved. Automated data extraction of key concepts and structured data from tables takes place. References are scanned by human users and incorporated into syntheses. The results are published as web services and consumed by the MAGICapp platform and made available for guideline developers. Core technologies include the use of semantic web and appropriate ontologies and controlled vocabularies to facilitate the effective sharing and re-use of data.
Conclusions: This pilot system architecture demonstrates the utility of emerging technologies to greatly enhance the efficiency of research surveillance and use.