In the Bioinformatics world as elsewhere XML Schema have rapidly become the de facto format for defining XML data exchange standards. XML Exchange schema have been adopted, proposed or are under development covering a wide range of information types and sources that are important to taxonomists and the wider biodiversity community. Indeed much of this standardisatition is being driven by organisations such as TDWG and GBIF expressely to mediate standardised data exchange and query for a wide user community.

At one level 're-usability' can be achieved by simply cutting and pasting complex elements between documents. However this can be difficult if elements have not been designed with modularity in mind, because of element interdependence, nor does it lead to stable interoperability. Maintaining discrete modular components, as distinct XML Schema, will allow more reliable reuse and interoperability. With modules being reused via the XML include operation etc.

GBIF have identified the following types of information resources for which XML Schema 'must be agreed for' (sec. Hannu Saarenmaa 2002)

A number of leading standards which address various aspects of these are under development and being adopted by GBIF and others

Biodiversity Focussed

Wider Scientific Community examples

These may provide useful resources to reference in biological data, or provide useful XML structures to use in our schemas - and may become consumers of resources wrapped in our schemas, or of objects defined in our schemas and given global identifiers (e.g. TaxonConcepts) eg.

Notes on Metadata

Information being exchanged should always contain metadata describing the contents of the data, its provenance, validity, etc. It would be sensible to have a shared metadata structure for all Biodiversity information resources, which could function as an envelope or wrapper for the data being transported. Infact Donald proposes that the ABCD schema format could be adopted as a top-level structure for a wrapper. In his own words...

it does seem sensible to provide for the metadata to be transferred with each data set so that ownership information, usage restrictions, known limitations, etc. are not lost. I would therefore like to put effort into adopting or developing a top-level document envelope suitable for all classes of biodiversity data exchange. This should include information on origin and ownership, data transformation history, taxonomic, geographic and temporal coverage (as appropriate) and any metadata necessary to allow processors to identify the schema(s) in use for the actual data within the document. The real content should be separable for re-use in other contexts, but such a metadata wrapper standard would bring us closer to automating the manipulation of a wide range of content. I have in mind here something like the ABCD structure, with a top level DataSets wrapper containing a number of different DataSet objects, each of which is made up of a set of Units. In effect the Units element would be a container for data elements from SDD, ABCD, TDWG-Names, etc. The DataSet-level elements would provide a common metadata model for all of these documents

It may be of course that metadata refers to resources which are themselves biodiversity data sets - or identifiers for such. For example If GUIDs are adopted for Taxon Concepts, the metadata associated with ecological data resource in EML could list GUIDs of Taxon concepts.

Of course Gregor seems to have jumped in and implemened all of this in his UBIF schema - and provided the proxy mechanism for handling references to other data sources, which can hold the reference to the real complex datasource, and also a simpler, abstracted representation of the information...

Notes on Publications

In the absence of a accepted global standard their are as many representations of publication as their are schema! They vary in depth and detail, and are often flat lists of elements, with a simple readable 'summary' string or may be resource connectors (Gregor's Proxy elements). There are several attempts to provide standard publication schema e.g. MARC-XML, XOBIS. but as yet this is unresolved. For a comparison of some of these proposals see the following UBIF WIKI topic.