Automated extraction of adverse drug reactions from biomedical literature and Food and Drug Administration (FDA) drug labels using machine learning

2019 Santiago

NANDAL U¹, Piette E¹

¹Elsevier

Background: Elsevier is a global information analytics company leveraging its rich tradition of curating and publishing leading scientific content to power clinical and research solutions. By combining content with cutting-edge technology in artificial intelligence (AI), machine learning (ML) and natural language processing (NLP), we enable professionals to find the precise information they need to advance their research and make decisions that affect the lives of patients and whole societies. In pharmacovigilance, post-marketing drug safety surveillance is critical to the protection of public health and monitoring the diverse sources of information for cases of Adverse Drug Reactions (ADRs) is a critically important and time-consuming task. Automatic extraction of ADRs from both highly regularized and variably structured content could play an important role in augmenting the information about ADRs that is obtained during short-term clinical trials.

Objectives: we aim to automatically extract adverse drug reactions from Food and Drug Administration (FDA)-structured product labels (SPLs) and scientific journal articles. Rule- and dictionary-based approaches to this problem may yield excessive false positives due to the lack of context consideration, as ADR terms may be indistinguishable from symptoms of diseases. Therefore, we aim to model the language surrounding ADR mentions to provide more precise predictions.

Methods: we randomly selected FDA SPLs by anatomical therapeutic chemical class and from Embase, journal articles containing mentions of drugs and ADRs. We then manually annotated mentions of drugs and ADRs in triplicate and harmonized annotations to create two separate gold standard data sets; one for SPL content and another for article content. We next used these manual annotations to train a number of ML models including CRF, BiLSTM, and spaCy for the prediction of drug and ADRs mentions.

Results: our current models, trained on 6234 natural language sentences (5909 unique) containing 9796 ADR annotations (2687 unique) from SPL content yield mean 5-fold cross validation precision (P), recall (R), and F1 scores (F) of 0.80, 0.78, and 0.79 respectively. For comparison, a dictionary-based method yields P, R, and F of 0.57, 0.70, and 0.63 respectively. Inter-annotator agreement ranges from 0.70 to 0.77 (Cohen’s kappa), suggesting that model performance is comparable to human performance in this domain. Separate model development for table content (eg tables extracted from SPLs, as opposed to natural language) is ongoing. Gold set manual annotation for the journal article content is also still in progress.

Conclusions: automatic extraction of ADRs from both highly-structured SPLs and less structured journal articles is feasible and represents a viable methodology for fact extraction.