Research Alert
Background: The current COVID-19 crisis underscores the importance of preprints, as they allow for rapid communication of research results without delay in review. To fully integrate this type of publication into library information systems, we developed preview: a publicly available, central search engine for COVID-19–related preprints, which clearly distinguishes this source from peer-reviewed publications. The relationship between the preprint version and its corresponding journal version should be stored as metadata in both versions so that duplicates can be easily identified and information overload for researchers is reduced.
Objective: In this work, we investigated the extent to which the relationship information between preprint and corresponding journal publication is present in the published metadata, how it can be further completed, and how it can be used in preVIEW to identify already republished preprints and filter those duplicates in search results.
Methods: We first analyzed the information content available at the preprint servers themselves and the information that can be retrieved via Crossref. Moreover, we developed the algorithm Pre2Pub to find the corresponding reviewed article for each preprint. We integrated the results of those different resources into our search engine preVIEW, presented the information in the result set overview, and added filter options accordingly.
Results: Preprints have found their place in publication workflows; however, the link from a preprint to its corresponding journal publication is not completely covered in the metadata of the preprint servers or in Crossref. Our algorithm Pre2Pub is able to find approximately 16% more related journal articles with a precision of 99.27%. We also integrate this information in a transparent way within preVIEW so that researchers can use it in their search.
Conclusions: Relationships between the preprint version and its journal version is valuable information that can help researchers finding only previously unknown information in preprints. As long as there is no transparent and complete way to store this relationship in metadata, the Pre2Pub algorithm is a suitable extension to retrieve this information.