|Home | About | Journals | Submit | Contact Us | Français|
Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases.
Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources.
We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http://neuroweb.med.yale.edu/senselab/
Neuroscience is in need of a new informatics framework that enables semantic integration of diverse data sources . Experimental data is collected across different scales, from cell to tissue to organ, using a wide variety of experimental procedures taken from diverse disciplines. Unfortunately the information systems holding these data do not link related data among them, preventing effective research that could combine the data to achieve new insights. Integrative neuroscience research is key to providing a better understanding of many neurological diseases such as Alzheimer's Disease and Parkinson's Disease, and could potentially lead to a better prevention, diagnosis and treatment of such diseases. The Semantic Web, a maturing set of technologies and standards backed by the World Wide Web consortium , offers technical guidance specifically in the area of aggregating and integrating diverse information resources. These Semantic Web technologies can be used to integrate neuroscience knowledge and to make such integrated knowledge more easily accessible to researchers. The foundational technologies of the Semantic Web — Resource Description Framework (RDF ), Web Ontology Language (OWL ), the SPARQL Protocol and RDF Query Language (SPARQL) — are widely implemented and are backed by a large community of users and developers. The chief advantages of Semantic Web technologies include 1) the widely supported standards backed by the World Wide Web consortium, 2) the ability to make use of the well-established inference mechanisms of description logics, and 3) the availability of a wide range of software tools.
A demonstration of Semantic Web technologies in the neuroscience domain [5-7] has been carried out, in the context of translational research, by the Semantic Web for Health Care and Life Science Interest Group of the World Wide Web Consortium. A major goal of translational research is to accelerate the bidirectional communication between basic research and clinical practice, in order to speed up the development of new clinical guidelines, tests, and therapies. The Semantic Web has the potential to facilitate the aggregation and integration of information from different institutions involved in this process.
As part of this community effort, we have created a Semantic Web framework for neuroscience research, based on the SenseLab collection of databases . Senselab is a highly accessed information resource for neuroscience research on the Web . Another motivation for converting SenseLab into Semantic Web format was that the “entity-attribute-value with classes and relationships” schema (EAV/CR ) on which SenseLab's architecture is based bears considerable resemblance to RDF. As a result, the conversion of SenseLab into the Semantic Web format (e.g., RDF) is facilitated. In fact, we have written a program to automatically convert SenseLab databases in the corresponding RDF structure. Such converted RDF-formatted data can then be loaded into an RDF store (e.g., Oracle RDF Data Model) for RDF-based querying. While we have demonstrated that a straightforward syntactic conversion can be done automatically, the RDF representation has limited expressivity and reusability. For example, RDF is mostly focused on the description of instances and does not allow for the detailed description of class properties, relations between classes, and automated classification that is central to our integration efforts. It does not offer constructs to describe sameness between entities from different data sources. RDF also lacks important features to enforce consistency checks to identify erroneous and contradictory statements, which is an essential feature when large, complex information repositories need to be merged.
To overcome these limitations, we use a more expressive ontology language, the Web Ontology Language (OWL), for representing richer semantics and logical statements. In addition, we adopt the current ontological standards and best practices in the process of creating the SenseLab ontologies. A goal is to allow the ontologies to have broad interoperability and reusability.
SenseLab consists of a number of specialized databases, three of which we have converted to the Semantic Web format: NeuronDB, BrainPharm and ModelDB. NeuronDB contains descriptions of anatomic locations, cell architecture and physiologic parameters (membrane properties consisting of transmitters, receptors and ionic channels) of neuronal cells based on compartmental models of neurons (Fig. 1). The pilot BrainPharm database is intended to support research on drugs for the treatment of neurological disorders. It enhances the descriptions in a portion of NeuronDB with descriptions of the actions of pathological and pharmacological agents. ModelDB is a large repository of computational neuroscience models and simulations. The computational models in ModelDB are annotated with references to NeuronDB. Taken together, these databases allow the researcher to query information and to run simulations pertaining to the function of neurons in healthy and disease states. The NeuronDB and ModelDB databases contain literature references and excerpts from texts that have been used to curate the database entries. This allows the users of SenseLab to verify the information in the database and can act as a starting point for further literature searches. The highly interconnected and hierarchical nature of these scientifically annotated data makes them suitable candidates for the creation of a Semantic Web resource in neuroscience.
This section describes the process of constructing the ontologies and converting data extracted from the SenseLab databases into the ontological structure. In addition, we discuss how to establish mappings from SenseLab ontologies to other existing ontologies. Finally, we mention the quality control and reasoning capability supported by OWL.
An ontology ‘scaffold’ made up of basic class hierarchies and relations was manually created, based on the structure of existing SenseLab databases. This scaffold could not be created by an automated process, since some of the structures and entity labels in the database needed to be slightly changed and re-interpreted to create a logically consistent and well-designed ontology.
The design of this scaffold was inspired by the realism described by Smith . The ontologies are primarily organized around direct representations of physical objects and processes (e.g., neuronal cells, ionic currents) in reality, and not around their abstractions (e.g., concepts and database entries). This approach has already been adopted for developing standard biomedical ontologies like those included in the Open Biomedical Ontologies Foundry (OBO Foundry ), one of the widely recognized community projects in the area of biomedical ontologies.
The scaffold contains basic classes from the domain of neuroscience, such as ‘brain region’, ‘neuron’, ‘gene’, and ‘serotonin receptor’ (subclass of ‘receptor’). It provides the semantic foundation for data querying, integration and inferencing. For example, based on certain user-defined relationships (e.g., a gene encodes a receptor) between different classes, semantic queries can be formulated to answer focused neuroscientific research questions (e.g., serotonin receptors are found in specific type(s) of neurons). Based on the hierarchical relationship between brain regions, we can infer child/parent regions at any level automatically. Some of the classes (e.g., neurons) can serve as a unit of integration across different data sources. For example, research statements about a particular neuron may be integrated from different databases.
For editing and viewing the SenseLab ontologies, we evaluated several OWL ontology editors including Protégé 3.2 , Swoop 2.3 alpha [14, 15] and TopBraid Composer 2.0 . While the first two are open source, the third is a commercial product. We started with Protégé but experienced some difficulties: i) certain uniform resource identifiers (URIs) that could not be decomposed into XML QNames were not displayed correctly, ii) namespaces and ontology import hierarchies were not handled as expected, and iii) some of the statements automatically created by Protégé did not adhere to the OWL DL standard. While we did not encounter these problems when using Swoop and TopBraid Composer, these ontology editors were not as stable as we had expected. To sum up, more stable, standards-compliant and robust ontology editors are needed for serious ontology design and editing.
The ontologies were mainly developed by a small group of people, and no dedicated software for collaborative ontology editing was used. This worked well for the scope of the current SenseLab ontologies. However, if future SenseLab ontology development involves a wider scope and a greater number of participants, it will make sense to use such software to minimize versioning conflicts.
The ontologies were built upon established foundational ontologies in order to maximize the interoperability with other existing and forthcoming biomedical Semantic Web resources. These ontologies were the Relation Ontology [17, 18] from the Open Biomedical Ontologies repository (OBO ), which defines basic relations such as ‘part of’, ‘participant of’ or ‘contained in’; and the Basic Formal Ontology (BFO ), which defines basic classes such as ‘process’, ‘object’, ‘quality’ or ‘function’. In , the SenseLab ontologies presented here are listed as one of the primary examples of the application of OBO Foundry resources.
The data from the SenseLab databases were automatically converted to OWL using programs written in Java and Python. The automated export scripts extended the manually created ontology scaffolds through the creation of subclasses, OWL property restrictions and individuals. In OWL ontologies like the ones created for the SenseLab project, the distinction between ‘ontology’, ‘data schema’ and ‘data’ is blurred. The main practical difference between ontology development and data conversion in our project was that the basic ontological structures needed to be developed manually, while the bulk of ‘data’ could be converted through automated processes. The resulting ontologies show no clearly distinguishable divide between a schema and data.
The OWL export of NeuronDB was based on a transformation from the EAV/CR model of the SenseLab database  to RDF serialized as XML (RDF/XML) by a Java program. The transformed information included descriptions of neurons based on research findings (e.g., neuronal receptors, channels and transmitters). The classes and individuals created by these exports were added to the manually created ontology scaffold as subclasses and instances of the classes in the ontology scaffold.
The export from ModelDB and BrainPharm was based on a simple flat text file export of the databases. The text file exports were converted to RDF/XML files with a Python script.
The mapping from neuron receptors to corresponding genes was based upon an automated transformation on the EntrezGene MySQL dump provided by Atlas . In the ontology, the neuron receptors were defined as gene products of genes. The genes in that mapping were identified by their common gene symbols.
Based on this list of gene symbols in the ontology, a mapping between gene symbols and NCBI Entrez Gene record identifiers was generated with the Clone/Gene ID Converter . This service returned the mapping between gene symbols and identifiers as a tab-delimited text file. The mapping in this text file was used for the generation of an RDF/XML file with a Python script. The RDF/XML was then merged with the main NeuronDB ontology file.
Based on the annotation of receptor proteins with gene symbols, receptor proteins were also annotated with Uniprot records that corresponded to the genes. A mapping between gene symbols and Uniprot record identifiers was generated with the SOURCE gene annotation service . Again, the resulting tab-delimited text file was used to generate RDF/XML which was merged with the main NeuronDB ontology file. Literature references in the source database were converted to references to NCBI Pubmed database entries.
For all of these mappings, we used the URI scheme for database record identifiers established by Science Commons . URIs for database records could simply be generated by concatenating the record identifier to a predefined namespace. For example, the Entrez Gene record with ID ‘3579’ was identified by the URI ‘http://purl.org/commons/record/ncbi_gene/3579’, the Uniprot record ‘P46663’ was identified by ‘http://purl.org/commons/record/uniprotkb/P46663’ and the Pubmed record with ID ‘11160518’ was identified by ‘http://purl.org/commons/record/pmid/11160518’.
It should be noted that all entries in NCBI Gene and Uniprot are specific to certain animal species. While species specificity is indicated in the annotations and in the ModelDB files, the data entries in the NeuronDB ontology are species-agnostic — they provide general descriptions of mammalian and arthropod physiology, which covers a wide variety of use cases. In cases where species-specific information is required, the textual annotations will be taken into consideration. The NCBI Gene and Uniprot references in the annotations can therefore be seen as species-specific examples, i.e., they do not necessarily cover all homologue proteins from all species.
Research statements in the SenseLab database were interpreted as claims of existence of a certain class of neurons with certain properties, which were added as subclasses to the ontology scaffold. An example of the application of this modelling approach is described in Fig. 2. Information about research statements (e.g., descriptive text, Pubmed references) were attached to these classes. Generalized classes derived from all available research statements for a specific type of neurons were added, which resulted in the three-layer design pattern described in Fig. 2.
The use of OWL reasoning and the creation of manually curated generalizations of research findings make it possible to harness OWL for the formulation of generalized, internally consistent world-views based on changing and often contradictory research findings. The contradictions identified by OWL reasoning in this manner can help in localizing disagreement between different data and hypotheses, and can help in judging the validity of competing hypotheses.
It was found that the design pattern for the representation of research findings and evidence used in the SenseLab ontology was easily expressed consistently in OWL and was well integrated with other ontologies in our collection. Other approaches (e.g., the definition of named RDF subgraphs for each set of research statements) were also considered, but they did not meet these criteria.
The three ontologies representing the SenseLab data were mapped to several related Semantic Web ontologies from the domains of neuroscience and biomedicine: 1) the BAMS ontology (created by John Barkley, National Institute of Standards and Technology, USA) which was derived from the Brain Architecture Management System (BAMS [27, 28]); 2) the Subcellular Anatomy Ontology (SAO ) created by the Cell Centered Database project ; 3) the BirnLex ontology  developed by members of the Biomedical Informatics Research Network ; 4) the Common Anatomy Reference Ontology (CARO ); 5) the Gene Ontology ; 6) the Ontology of Biomedical Investigation (OBI)  (a mapping still quite rudimentary at the time of this writing). URIs from SenseLab ontologies are also referenced in the OWL version of the Psychoactive Drug Screening Program (PDSP) Ki database of receptor-ligand interactions .
The mappings were created by a person with expertise in both ontology engineering and neuroscience, which was indispensable for carrying out this task. They were created with standard ontology editing software. No automated algorithms for ontology mapping were used.
The W3C RDF validator , a web-based tool hosted by the World Wide Web consortium, was used for checking well-formedness of RDF/XML and basic RDF syntax validation. The Java-based reasoner Pellet 1.4 [38, 39] was used for consistency checking and classification. It turned out to be essential to check syntax and ontological consistency after each major step of ontology development, as both syntactic and semantic errors were often introduced through human error or malfunction of software tools.
OWL inference was used to test which neurons in the database were in accordance with one of the ‘canonical neuronal forms’ described by SenseLab, for example the canonical form “neuron having an axon and apical dendrite”. While such a classification could also be done manually, the use of automated reasoning has the potential to speed up the process and allows flexible re-classification of all neurons when the definitions of canonical forms should be changed.
However, the greatest utility of OWL reasoning did not lie in the inference of new relationships based on complex logical deductions, but rather on consistency checking and the avoidance of errors in the knowledge base. During the development of the knowledge base, some errors were identified through simple reasoning processes. For example, based on class disjoints in the ontology scaffold, the OWL reasoner pointed us to an error: some classes (e.g. ‘GABA’, which is a common acronym of ‘gamma-aminobutyric acid’) were subclasses of both ‘neurotransmitter’ and ‘receptor’, which was wrong. This was an error caused by the automated conversion — both the GABA transmitters and the GABA receptors were simply labeled with ‘GABA’ in the source database. The conversion algorithm generated URIs based on these labels, so they were represented with identical URIs (http://neuroweb.med.yale.edu/senselab/neuron_ontology.owl #GAB A). Since ‘neurotransmitter’ and ‘receptor’ were declared as disjoint in the ontology scaffold, we could identify this problem early on and revise our conversion scripts accordingly. This error would have been noticed much later without the use of OWL reasoning, and would certainly have led to unexpected bugs in software that makes use of the ontology.
The Web addresses for downloading or importing all OWL files of the SenseLab Semantic Web infrastructure are listed in . OWL makes it possible to import these ontologies into future ontologies by a simple reference to the URL of the ontologies. The ontologies can also be queried via Hypertext Transfer Protocol (HTTP) with the SPARQL RDF query language. The SPARQL server is based on the open source version of Virtuoso , a web server with an integrated, highly scalable RDF database. Instructions for accessing the SPARQL server are available at .
The resulting SenseLab Semantic Web ontology collection is made up of seven ontology modules. Each ontology module is available as a separate OWL file with a specific Web address. The ontologies conform to the “OWL DL” specifications so that they can be classified by standard description logic reasoners. The separate ontology files give users the flexibility to selectively import or query those ontologies with a particular focus. The dependencies between ontologies are encoded in the ontology files through OWL ‘import’ statements. OWL-aware software can use these statements to load recursively all required ontology modules from the Web.
The basic statistics for each ontology module are summarized in Table 1. The NeuronDB ontology , ModelDB ontology  and BrainPharm ontology  contain the bulk of data from the respective SenseLab databases, together with some additional references to the NCBI Gene sequence database and the Uniprot sequence database. The other ontologies are mainly comprised of links/mappings between the SenseLab ontologies and ontologies created by other groups.
The biological function of receptors and transmitters was represented through subclasses of the ‘Function’ class from BFO (Fig. 3). Where applicable, classes from the ‘molecular function’ branch of the Gene Ontology were used (e.g, ‘dopamine receptor activity’). When no corresponding classes could be found in the Gene Ontology, new classes were created as part of the NeuronDB ontology and placed in the existing hierarchy of classes from the Gene Ontology. For example, the class ‘Dopamine D1 receptor activity function’ was created as a subclass of ‘dopamine receptor activity’ from the Gene Ontology. Molecules were linked to their molecular functions through the ‘has function’ property. For example, the dopamine receptor class has the defining property
has_function some ‘dopamine receptor activity’
The motivation for this exercise was to enable interoperability with the Gene Ontology, and other domain ontologies that make use of the Gene Ontology. In this way, the widely accepted Gene Ontology can be used as a bridge between ontologies about neuroreceptors, a knowledge domain where a widely accepted standard ontology is still lacking. For example, if another group would develop their own ontology of neuroreceptors and would reference the Gene Ontology in a similar fashion, it would be possible to infer class equivalence between the independently developed ontologies based on the references to the Gene Ontology.
The ‘has part’ relation from the OBO Relation Ontology found extensive use in the ontology. For example, the anatomic structure of the Archicortex was described with the following restrictions
has_part some Dentate
has_part some Hippocampus
The Hippocampus was described with
has_part some ‘CA1 oriens alveus interneuron’
has_part some ‘CA1 pyramidal neuron’
has_part some ‘CA3 pyramidal neuron’
The finding that some CA1 pyramidal neurons have receptors for the neurotransmitter GABA in the Soma region was captured by the creation of a class with the following properties
has_part some (‘Soma’ that ‘has receptors’ some ‘GABA receptor’)
Some basic examples of queries that are possible based on the SenseLab ontologies are listed in Table 2.
In RDF/OWL, relations between entities defined in different ontologies do not differ from relations defined inside a single ontology, i.e., querying and inferencing can be done over several ontologies as if they were one. Classes in the SenseLab ontologies were connected to classes in other ontologies through class equivalence relations, class-subclass relations and whole-part relations. Examples for such relations spanning different ontologies are given in Fig 4. The import dependencies between ontology modules are depicted in Fig. 5.
The SenseLab ontologies presented in this paper are part of the Health Care and Life Science demo  of the W3C Semantic Web for Health Care and Life Science Interest Group and Science Commons. The demo consists of a large collection of ontologies and RDF data from the biomedical domain. It has been further extended and maintained by Science Commons, forming the ‘Neurocommons Knowledge Base’ 
We reaped several benefits from the use of Semantic Web standards and tools. The integration of the SenseLab ontology with several other neuroscientific Semantic Web resources was easily accomplished based on the foundational ontologies. The use of established ontologies like BFO and the Gene Ontology has led to a clear, consistent and transparent representation of biological reality that would not have been readily achieved with relational databases or XML documents. This facilitates shared understanding between developers as well as between users of the ontology. Furthermore, the semantics associated with ontology constructs are described in human-readable form directly in the ontologies, which makes most ontologies self-documenting. The use of OWL ontologies helped us focus our work on the description of biological reality, and less on unnecessary artefacts such as database tables, columns or documents. OWL reasoning and consistency checking allowed the automatic identification of logical errors introduced during data entry and conversion, as well as true contradictions in the research information. Many of these errors and contradictions would not have been identified without the use of reasoners and would have caused complications or incomplete results when querying and mapping the ontologies.
The use of foundational ontologies such as BFO  or the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE)  is beneficial and in certain cases indispensable for the integration of independent ontologies. Foundational ontologies allow the creators of domain ontologies to reuse basic ontological constructs instead of re-inventing them again and again.
Turning an existing database into a useful and semantically consistent ontology is in most cases not a purely mechanical endeavour. A useful ontology cannot simply be generated through a generic syntactic conversion. A semantic and ontological re-interpretation is necessary. Syntactic conversion alone is not enough for realizing complex integration of different databases, since the associated semantics often do not match or are highly ambiguous. The conversion has to be informed by biomedical domain knowledge, as well as knowledge of basic ontological principles. For example, the ontology creator should invest some time in answering questions such as “Is an electrical current across a membrane an object, a process or a property of the membrane?”, “Is the relation between the ‘hippocampus’ and the ‘hippocampus proper’ an is-a relation or a part-of relation?”, or “Is ‘neurotransmitter’ a class of molecules, or a role that certain molecules can play in a certain scenario?”. On the other hand, an overly precise ontology may hamper its effective use. A major factor in the success of any ontology is the balance between solid, logically consistent and unambiguous description of entities on one hand, and pragmatic features such as intuitiveness, ease of queries, openness to change and overall simplicity on the other hand.
One outstanding issue that needs to be addressed is the agreement on stable, preferably resolvable URIs for bioinformatics resources such as protein and publication records. Unfortunately, most primary data providers have not started producing usable URIs for their resources. The URI system that is being developed by the Science Commons based on persistent uniform resource locators (PURLs ) may be a possible solution to this problem.
Another pressing problem that caused difficulties during the development and use of our ontologies is the lack of scalable querying and reasoning support for OWL by triplestores. This makes it much difficult to write queries and applications for OWL ontologies. The solution is the creation and standardization of new, OWL-aware triplestores and query languages. Such a solution may take a considerable amount time. Therefore, our approach to to apply simple algorithms and best-practices to make complex OWL ontologies amenable to existing, standard RDF tools and query languages. In addition, we have been collaborating with Oracle in exploring the use of Oracle 11g  as a proprietary OWL triplestore for storing, querying and reasoning about OWL ontologies. This academic-industrial collaboration may help contribute to the future standardization of OWL-based triplestore technologies.
Lastly, more work needs to be done on the representation of uncertainty, evidence and data provenance in OWL ontologies. These are currently addressed by several working groups, including the W3C Semantic Web in Health Care and Life Science Interest Group (HCLSIG ).
We have demonstrated how Semantic Web technologies can be used in the context of neuroscience data integration. While other projects have adopted Semantic Web standards like RDF and OWL for local information representation, our project is among the first that actually use Semantic Web technologies to create a neuroscience semantic web that spans over different information sources hosted on different web servers and developed by independent groups. We also showed that the use of more advanced logical formalism like OWL, as well as the use of foundational ontologies, has real practical advantages. The Semantic Web has the potential to become a standard platform for semantic integration of neuroscience data.
Two future threads of development are based on the current work. First is the development of an easily accessible and intuitive web user interface to query the ontologies without needing to write verbose SPARQL queries. The development of Entrez Neuron , a web portal based on the ontologies presented in this paper, is one step in this direction. The second future thread of development is the exploration of strategies to make syntactically complex OWL ontologies such as NeuronDB better accessible to standard RDF tools and query languages. Furthermore, we are expanding the SenseLab ontology collection by: 1) adding mappings to other ontologies (e.g., the OBO Chemical Entities ontology) and 2) converting new databases to OWL.
The Semantic Web development in SenseLab is integral to the activities within the Semantic Web in Health Care and Life Science Interest Group (HCLS IG). The activities of this group span many different disciplines and are driven by participants from different sectors and countries. The existence of such a group with a strong backing in the communities of biology, medicine, computer science and philosophy is essential for the kind of large-scale information integration that is so often demanded — e.g., to realize a working infrastructure for translational medicine. The HCLS IG will continue to explore how to build a Semantic Web infrastructure for integrating biomedical data and disciplines, and to raise the awareness for Semantic Web technologies in the scientific community. The Semantic Web development in SenseLab will continue to contribute to this community activity.
This work is supported in part by: NIH grant P01 DC04732 and Fidelity Foundation, a postdoctoral fellowship from the Konrad Lorenz Institute for Evolution and Cognition Research, Austria and by the Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2). We thank the members of the W3C Health Care and Life Science Interest Group, the Science Commons/Neurocommons project and the developers of the Basic Formal Ontology for their feedback and cooperation.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
University at Buffalo, South Campus
Buffalo, NY 14215