medrxivr: A new tool for searching for and retrieving records and PDFs from the medRxiv preprint repository

Article type
Authors
McGuinness L1, Schmidt L1
1Department of Population Health Sciences, University of Bristol
Abstract
Background:
The medRxiv [med-archive] repository, which hosts copies of health-related manuscripts uploaded prior to formal peer review and publication, represents a key source of grey literature for systematic reviewers. However, the current web-based interface to this repository presents some key challenges for systematic reviewers. Only relatively simple searches may be performed, and there is no option to bulk-download either the meta-data (e.g. title, abstract, subjects) or a copy of the full-text PDF for records identified by the search. We sought to address these issues via a new tool, medrxivr.

Objectives:
To develop a new tool that allows users to search the medRxiv preprint repository data using complex search strategies, and to download metadata (e.g. title, abstract, authors, subject category) and full-text PDFs for records identified by their search.

Methods:
The R programming environment was used to create a snapshot of the medRxiv preprint repository, and to develop a new R package, medrxivr, and associated web-based application that allow users to query the snapshot.

Results:
The baseline snapshot of the medRxiv database was created in November 2019. This snapshot is updated daily to capture new records added to the repository.
The medrxivr R package and associated web application (package bit.ly/medrixvr-package:, app: bit.ly/medrxivr-app) were made available in March 2020 and allow users with varying levels of ability in R to search the snapshot for relevant articles. Search strategies using Boolean logic (AND, OR, NOT) and regular expression syntax (e.g. “[Tt]est” finds both “Test”, and “test”) can both be used, while results can be filtered by date of publication. As records on medRxiv can be updated, users can also choose to retrieve all versions of a given record or only the most recent one.
Identified records can then be passed to a helper function in the R package which will automatically download the full-text PDF of each records to aid with the full-text screening process.

Conclusions:
medrxivr enables systematic reviewers to search for and retrieve relevant metadata and full-text PDFs for articles in the medRxiv preprint repository.

Patient or healthcare consumer involvement:
As this was methods focused project, no patients or healthcare consumers were directly involved in the tool’s development. However, a tool that helps users search (and retrieve data from) a health-focused preprint repository such as medRxiv will promote the production of systematic reviews that incorporate this source of grey literature, resulting in a more accurate and up-to-date summary of all available evidence, thus maximising the relevance of these reviews to patients and the public.