Jeff Spitzner
Collaborative Knowledge Integration:
Enabling Semantic Exchange of Research

Collaborative Knowledge Integration: Enabling Semantic Exchange of Research

Life sciences R&D is collaborative and information intensive, and its primary goal is the creation of valuable knowledge assets. Realizing value from a project almost invariably requires recording the work as well as some form of communication or exchange of information products associated with the research. Researchers typically report certain information about experiments - aims, background information, methods, samples, results, data analysis, discussion, and conclusions. The collection of information that constitutes a record of a complete experiment often includes content in paper lab notebooks, files and documents stored on personal and shared computers, and data managed in LIMS, document management, and database systems. Significant advances have been made in systems that support access, integration, and mining of these data.

Complete records for an experiment generally extend beyond a single data set or format to include documents, diverse scientific data types, analyses, annotations, and reports, plus metadata that relate authorship, date stamps, digital signatures, and other information. Although certain scientific data components of an experiment may now be described using informatics standards, there is no public standard enabling description, exchange, and reuse of a complete experiment. In fact, the only method for sharing an entire experimental record may be to send an email consisting of numerous file attachments, plus text to relate the contents. A scientist may readily interpret this information; however, because it lacks the 'machine-readability' needed for the semantic exchange and software interoperability of research information, the content is not suitable to enable integration in collaborative knowledge networks and diverse computing infrastructures. Thus, a lack of information standards for sharing research records is a significant barrier to generating and managing knowledge assets.

There now exists both a significant community need and an opportunity to develop an information encoding framework for the semantic capture, management, exchange, and reuse of records of experimental research. The goal is a public domain object model that provides a self-describing, layered, ontological basis for encoding the contents of an experiment syntactically, structurally, and semantically. I will review the components of a complete experiment record that must be captured for machine-readable exchange using a minimal Portable Experiment Format. I will then discuss requirements and strategies for developing a public domain Interoperable and Extensible Research Exchange (INTER-XTM) Framework to support scientific collaboration. Achievement of these objectives relies heavily upon use of technologies and concepts that are now available. These include IT infrastructure (XML, RDF, OWL), general standards and representations for biomedical research content (e.g. STMML, LSID, and standards from NCI and NCBI), and domain specific life sciences data standards, ontologies, and controlled vocabularies (e.g. CML, BSML, GO, MAGE, BioPax, SBML, HapMap, nci Ontology, SNOMED, HL7, and many others).