Semantic Scholar Labs

Demos and open source tools from the Semantic Scholar research team at AI2


Citeomatic - Machine Assisted Literature Review

Citeomatic is a deep learning model for the citation prediction task. Unlike previous work, Citeomatic is specifically trained to learn a robust model that gives meaningful predictions, even when it’s wrong. Relying only on the title and abstract of a query paper also allows Citeomatic to to be a useful literature review tool at any stage in the writing process.

Open Corpus

Open Corpus - Open Scholarly Research Corpus A large scale corpus of academic literature, containing 7.2 million academic papers primarily drawn from computer science and neuroscience. In addition to the contents of the paper, OpenCorpus provides metadata (e.g., title, authors, paper abstracts, incoming and outgoing citations) for the vast majority of the documents. This makes the corpus usable as-is for a variety of


Science Parse - PDF MetaData Extraction

ScienceParse is a state-of-the-art metadata extractor for scholarly articles.

It extracts important scholarly metadata from PDFs, including titles, author information, abstracts, sections and references.


Deep Figures - PDF Image and Caption Extractor

The DeepFigures project uses deep learning to extract both image and text based figures with high precision and recall from academic PDFS.