Cosmos

Cosmos is an open source semantic search engine that focuses on the retrieval of information from PDF documents. While created with the intention of automating the process of scientific discovery and analysis, the components can be applied generally to stacks of documents.

Cosmos is composed of three parts:

1. Ingestion of information in PDF documents. Cosmos automates the process of extracting text, tables, figures, and other components commonly found in PDF documents.

2. Retrieval of information across stacks of documents. Cosmos utilizes ElasticSearch and an optional neural reranker.

3. Extraction of information. Cosmos is also packaged with a question answering model that can be deployed to answer queries with subspans retrieved from the retrieval module.