University at Buffalo - The State University of New York
Skip to Content
SciLite annotations - Tools - Europe PMC

SciLite annotations

  1. How to use SciLite Annotations
  2. Information for text miners

How to use SciLite Annotations

What is SciLite and how is it useful?

SciLite allows biological terms or relations, such as diseases, chemicals or protein interactions, to be highlighted for readers on abstracts and full text articles. These terms are identified by text mining algorithms, developed by a variety of text mining groups.

For readers SciLite makes it easier to scan an article and get a quick overview. It helps in finding key concepts, and discovering evidence, such as gene–disease associations or molecular interactions. SciLite enables users to locate the primary data in the text by linking text-mined entities to public life sciences and chemistry databases. The goal of SciLite is to support scientists and database curators in their literature research by harnessing the power of text mining, and to promote the contribution of text miners to the advancement of science.

What types of annotations are available?

SciLite provides annotations for core named entities (e.g. gene/protein names, organisms, diseases, chemicals, Gene Ontology terms, etc.), biological events (e.g. phosphorylation), functional relations (e.g. gene–disease associations, protein–protein interactions), as well as biological functions (e.g. gene function).

Using SciLite

As a reader views an article in a web browser, any annotations associated with it are made available in a menu alongside the article, as shown below.

Annotations menu

Screenshot showing SciLite annotations highlighted on a full text article

Users can control the selection of concepts they see by checking the corresponding boxes, which highlights colour coded annotations in the text. To see a list of individual terms users may click on the right arrow next to the annotation type. A selection of terms found most frequently in the text appears, together with up/down navigation buttons, which allow the user to jump to selected terms in the text.

Clicking on the highlighted terms in the text opens a pop-up menu with information about the given annotation (below).

Screenshot showing feedback feature for an annotation

The pop-up menu displays a link to related database record, the source of the annotation, and the feedback link. In the case of overlapping annotations in a sentence, we highlight the longest annotation, and the individual annotations within the phrase can be seen in the pop-up window.

It is of critical importance that readers find the annotations useful. Readers can provide feedback on each annotation, e.g. mark incorrect annotations or endorse useful ones. This information is fed back to the Europe PMC team and will be acted upon, helping to improve the annotations overall. If you find an incorrect annotation, or the annotation is too generic and is highlighted too often, you can report it by clicking or tapping on the highlighted term and using the Feedback link in the pop-up window. You can also endorse annotations using the Feedback link, if they are useful to you.

Information for text miners

Europe PMC is a community platform, open for contributions that enhance our interaction with the scientific literature. SciLite enables text miners to showcase their work to the wider public. We welcome contributions from text-mining and other associated communities and encourage them to share annotations on the SciLite platform. Any text-mining group can participate by providing their annotations in a specific format described below.

Getting started

If you are a text-mining group and can supply annotations in the format we require (see below), then please send us an email to to set up an account. Annotations may be generated on your own local set up, or a virtual machine on the EBI Embassy Cloud could also be used.

Ground rules

  • Annotations appear on all abstracts, and full text articles with a CC-BY, CC-BY-NC or CC0 license.
  • Annotations need to be formatted according the data descriptions below to be displayed in SciLite. They can be produced on an ongoing basis, or can be a static dataset, but the provenance of the annotation needs to be clear.
  • In the case of overlapping annotations, we highlight the maximum length, and the individual contributors can be seen on the pop-up menu.
  • The reader is in control of which annotations they see, and can provide feedback if they wish.
  • Europe PMC does not make quality judgements on the annotations: this is in the gift of the readers and the wider the text-mining community.

Data requirements

We chose the W3C Web Annotation Data Model as an emerging generic standard for web annotations, meaning that any annotations displayed within SciLite can be shared, reused and integrated with other types of annotation such as comments. Once concepts of interest have been identified within the text, they are formatted accordingly, and stored in a triple store via the EMBL-EBI RDF Platform.

  • Annotation type: Genes/proteins, molecular interactions, gene–disease associations, biological events (e.g. glycosylation)
  • Accession no./ID: Depending on the data being annotated the corresponding IDs or accession number must be provided, for instance, UniProt accession for gene- or protein-based annotations.
  • PMC ID: PMC ID for the annotated articles.
  • URI scheme: Using common URI schemes aids data integration with RDF. For instance, we use canonical URIs when existing databases provide stable URIs, e.g. UniProt - In cases where no canonical URIs are available the registry of scientific identifiers is used.

We strongly encourage the providers to pay close attention to the URI schemes used.

Example files

Annotation model

For more information refer: and

Provenance model

Data provenance vocabulary:

GeneRIF annotation in RDF

@prefix annotations: <> .
@prefix dc: <> .
@prefix dcterms: <> .
@prefix epmc: <> .
@prefix oa: <> .
@prefix orb: <> .
@prefix provenance: <> .
@prefix rdf: <> .
@prefix uniprot: <> .
@prefix void: <> .
@prefix xsd: <> .

annotations:PMC2761928#8 a oa:Annotation ;    void:inDataset provenance:2016-04-29 ;    oa:hasBody <> ;    oa:hasTarget uniprot:B3CJ46 .
<> a oa:SpecificResource ;    dcterms:isPartOf [ a orb:Header ;       dcterms:hasPart dcterms:title ] ;    dc:description "The killing activity of the McbC protein raises the possibility that it might serve to lyse other M. catarrhalis strains that lack the mcbABCI locus" ;    oa:hasRole oa:highlighting ;    oa:hasSelector <,148> ;    oa:hasSource epmc:PMC2761928 .
<,148> a oa:FragmentSelector ;    rdf:value "line=0,148" ;    oa:confirmsTo <> .

Provenance graph

@prefix dc: <> .
@prefix purl: <> .
@prefix rdf: <> .
@prefix rdfs: <> .
@prefix void: <> .
@prefix provenance: <> .

provenance:2016-04-29 a void:Dataset ;    dc:description "GeneRIF produced by Bibliomics and Text Mining group at the HES-SO, Geneva and Europe PMC, EMBL-EBI, Hinxton" ;    dc:publisher <> ;    dc:title "Gene Reference into Function (GeneRIF)" ;    purl:importedBy <> ;    purl:importedOn "2016-05-03" ;    purl:version "2016-04-29" ;    void:triples "251602" .