Open thread

exploring

Can scientific literature be made genuinely navigable — not just searchable?

A ranked list answers one question: which documents match these keywords. It says nothing about how a field is shaped, which results build on which, where two communities are quietly disagreeing, or which corner nobody has looked at yet. The work at SciX has been about putting that structure on top of the ADS corpus, using embeddings for meaning, citation graphs for how the ideas connect, and controlled vocabularies to keep the grounding honest. What sits below is the current state of that, and the papers I read to push on it.

retrievalembeddingsknowledge-graphsscientific-search

The work

Reading path

A generated path through 5 papers — assembled using her SciX literature tools (semantic embeddings, citation graphs, and reading-order signals).

  1. SPECTER: Document-level Representation Learning using Citation-informed Transformers ↗

    2020 · 125 cites

    Start with citation-informed document embeddings: how papers cite each other turns out to teach a model what they mean.

  2. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks ↗

    2020 · 463 cites

    Then the case for domain adaptation: generic models leave signal on the table in specialized corpora.

  3. Building astroBERT, a Language Model for Astronomy & Astrophysics ↗

    2024 · 17 cites

    astroBERT applies that to astronomy, a domain model trained on the same ADS corpus I work in.

  4. Experimenting with Large Language Models and vector embeddings in NASA SciX ↗

    2023 · 2 cites

    Our SciX experiments: embeddings + vector search over the live literature, and what broke.

  5. Knowledge Graphs ↗

    2020 · 118 cites

    Close on knowledge graphs, the structural layer ranked retrieval can never give you.

Open literature

From my ADS library

Scientific Search & SciX 84 papers Browse all →

Pulled from her curated ADS library.

Explore the links