Our Opinion (open for comment!)

The TCS schema was conceived to allow the representation of taxonomic concepts as defined in published taxonomic classifications, revisions and databases etc. (see RationaleForATaxonomicConceptTransferSchema).

As such, it specifies the structure for XML documents to be used for the transfer of defined concepts. Valid transfer documents may either explicitly detail the defining components of taxon concepts, transfer GUIDs referring to defined taxon concepts (if and when these are available) or a mixture of the two.

The TCS schema is not designed to facilitate the exchange or documentation of information about Taxon Concepts where this information is not part of a taxonomic revision creating new concepts. (See also WhoDefinesTaxonConcepts). The amount and variety of (additional) information that can be potentially assigned to concepts is outside the scope of a taxonomic concept transfer schema, but we would encourage the development of domain specfic models that use or extend this schema. XML supports this flexibility by allowing the use of different name spaces.

For example, whilst a TCS taxon concept definition may include details of specimen circumscription (i.e. list specimens that are asserted to define the taxon concept) datasets that merely include observations identifying specimens as being examples of a taxon concept would reference a defined taxon concept, not constitute a new or modified concept. That is, TCS documents are for transferring the definitions of taxon concepts, not for detailing observations of these defined concepts.

Examples of observational datasets that could refer to defined taxon concepts might include

The observations recorded in these would not constitute definitions of the concepts, but document instances of the concept. (However, observation datasets might themeselves be included as part of a taxon concept definition, e.g. as specimen or character circumscriptions - if this is what the 'creator' of the concept intended).

Once concept definitions are globally available and have been assigned GUIDs, referencing them in observation datasets etc. is simplified. In the absence of GUIDs some form of user key could be recorded that would identify a preexisting concept. This might consist of the scientific name and concept author (AccordingTo).

Where there is no appropriate, prexisting globally defined concept to reference, datasets could provide TCS definitions of the concept being described/observed/identified in TCS format. These would minimally consist of a name, but optimally include reference to the origin of the concept definition (The AccordingTo element in TCS). Thus field data that records presence of Aus bus, Aus cus and Aus dus, as identified in the field guide of (Kubrick, 2001) should include this information in the data description/mark-up. It then will be possible to resolve these incompletely defined concepts with other concept records represented in TCS format.

The quality of concept resolution will depend on the detail of concept description provided. Where it is possible to reference Taxon Concept GUIDs, complete unambiguous identification is provided. On the other hand if only a binomial name is recorded (as a partial 'user key' for a Taxon Concept), the best resolution possible would be to any concept sharing that name. If a dataset refers to a name + AccordingTo authority as a composite 'user key' for a Taxon Conceptin, accurate concept resolution might be possible to identically defined concepts, or it may be necessary to resolve the identification through intermediates. (For example if Kubrick's 2001 field guide is known to use the classification of Kirk et al 1971, an identification to a poorly defined 'Kubrick' concept might be resolved to other data using the better defined Kirk Concepts).

Of course using GUIDs to mark-up data is not 'user-friendly', and does not provide a human readable record of the concept. Users may prefer to use or be provided with composite user keys for concepts (even alongside GUIDs where these are available). (Composite User Keys that might uniquely identify concepts might represent the Scientific Name + AccordingTo authority). Users would have to be aware that the accuracy of concept resolution would be decreased if GUIDs are not available/used or truely unique key combinations are not specified.


The function of TCS Documents is to transfer Concept Defintions, not information about Concepts. Domain specific extensions to the TCS Schema could define Document structures to transfer extra information about Concepts. (copy and pasted from DraftStandardDiscussion)

What is the concept definition? TCS v0.72 requires only ID for a TaxonConcept. Does it mean that a defined concept in TCS is something 'approved' by GUID issuing body? -- JamesYtow, 01-Sep.-2004

Not at all. GUIDs are a mechanism people want/need to be able to reuse concepts without having to pass all the information in a concept definition around. To do this the schema has to allow 'empty' concepts that just contain the GUID which can be resolved to a concept held somewhere. There should be no concept 'police' - and probably multiple GUID issuers - possibly anyone who can provide a resolution service (there is no point to a GUID that doesnt represent a real object somewhere). The 'concept definition' could be anything anywhere that can be represented as a TCS concept eg an ITIS, IPNI, IOPI, Nomencurator 'object'. 'Issuing Bodies' might sort of 'approve concepts' if they are only issuing GUIDS for 'their own' concepts. For example you might set up an LSID server to provide GUIDS for concepts ('Name usages') that you wished to share with others. They could then reference your concepts in datasets etc - and taxonomic queries could resolve the concepts through your LSID server to give the full definition of the concept (according to your data model).

GUIDs etc are a big issue and TDWG /GBIF are planning to discuss them in New Zealand..... -- TrevorPaterson 01 september

The main issue I asked is not GUID itself: GUID is a kind of name so we need to think well how yet-another-name can improve the situation without some mechanism similer to registration, but it is not my major concern here. The point is what is the definition of a taxon concept in TCS. If "The function of TCS Documents is to transfer Concept Defintions, not information about Concepts" then we need clear definition of Concept Definition. If the schema carries information about concept, on the other hand, we can use relaxed, less-formal but informative way like flexibility of the TCS. If Concept Definition depends on each data model using TCS, then how can we assure that TCS transfers Concept Defition but not inofrmation about Concepts? -- JamesYtow, 01-Sep.-2004

TCS only has elements that we consider might be part of the definition of a concept, after talking to a wide variety of taxnomist and database managers. It seeks to represent whatever they consider to be the definition of a concept - not what we consider to be the definition of a concept. Where in the schema could such extra non-definitional information be described without extending the schema? -- TrevorPaterson 02 September

OK, The Concept Definition is something relaxed; even only a GUID or Name can be a Concept Definition. What is the difference between Concept Definition and information about taxon whithin this relaxed sense? How can we differentiate 'poorly defined', 'the same', 'bona fide' used in WhoDefinesTaxonConcepts? -- JamesYtow 02-Sep.-2004

Author 1 publishes a concept definition TC1, authors 2, 3 ,4 etc refer to TC1 to label their data. They are not publishing a new concept, just saying they have observations about specimens/organisms etc that they think are examples of TC1. This new information is not part of the definition of TC1. A trivial example would be: Linnaeus defined the concept of a daisy - I record the fact that there are a lot of daisies (sec. Linnaeus) in my garden......--TrevorPaterson 03 September

What we have are: publications of authors 1, 2, 3, 4 and so on. The authors #2 or more say that they 'think' their objects as examples of TC1. It is not part of TC1 of course, but unnecessary to be same to TC1. I'd like to record the fact that there are authors used the name literal with reference to the TC1. These records are unnecessary to be the TC1, just like you may have mutants or even GM ones of daisy in your garden. -- JMS 03-Sep.-2004

OK, but you aren't modelling the process of taxonomy then....just the process of data accumulation - every time someone publishes a new human DNA sequence do we have a new concept of a human.....? If you want a database to handle that - good luck ;-) -- TrevorPaterson

We need a model to record facts in taxnomic processes to track taxonomic processes, and a mechanism tracking the process based on recorded facts. Making decision when a new concept created can depend on views of taxonomists/users. If a user decides that addition of DNA sequence changes taxon concept, then it should be managable as differnt concepts for the user. It is not task of exchange schema. So I worry how we can define 'the same' concept which should be an essential part of giving GUID to concept. Although the TCS has flexibility allowing multiple representation, GUID of concept would spoil the flexibilty. I think the schema should be separable from definition of concept definition to utilise this flexibility. -- JamesYtow 03-Sep.-2004 / ammend. 04-Sep.-2004

In general I think we need to be careful about distinguishing database models from data exchange schema. Using databases a complete structure of interrelated data elements can be created that can record just about any set of complex relationships, including a history of prior values of data elements. Relational databases can record many-to-many data relationships that allow the same data to be viewed from multiple angles. But, an exchange schema is a static document with only one view of the data. It seems sometimes that people are discussing XML exchange schemas as if they were databases. An XML exchange schema does not create a concept or a concept record, it merely passes the data along. An XML exchange schema does not create a GUID, it merely passes it along. An XML exchange schema is part of a process of communication, it is not the process itself. -- ChuckMiller? 03-Sep.-2004

I'm going to TDWG so I'm trying to get my mind around this new schema. I gather the intent of the TCS schema is to deliver a set of concept data elements resulting from a query for some values of some data elements included in the schema. But, the pure TCS schema itself contains no data. It is used to format data from a data source, presumably a database. So, presumably there are databases populated with all the concept information (like lists of specimens) that will respond to the query in TCS schema format? Unless data has been recorded in a database somewhere, a query will return no result. I wonder if there is no control over GUIDs other than (I presume) uniqueness, won't this just quickly turn into a mass of GUID synonymy? Multiple GUIDs can refer to the same concept. And worse, if they are strictly serialized numbers, they are meaningless strings of characters that are hard to distinguish by humans. How is that useful? Since the concept data records will be distributed across multiple data repositories and there is no control for standardization, how could synonymous GUIDs ever be revised or corrected? This is all confusing to me. -- ChuckMiller? 04-Sep.-2004

My take on this is as follows: A data provider has a database of taxonomic information (concepts, names, whatever he finds useful to record). He wants to make the data available to the community so he converts relevant information into the TCS either as a one off file or via a (web-) interface. In the latter case the interface might be able to return records by ID (there needs to be some sort of guarantee that the same record is returned for the ID). If this is the case the IDs can be registered with a resolution service (centralised or distributed - still to be discussed) and become a GUID. This is useful as from now on it can be used to refer to concepts from other TCS concepts records without including the data, as it can be retrieved on demand. And also the GUID can be used to mark up identifications or other datasets (e.g. ecological).

So far the potential. In order to be usable the actually the framework needs to exist and contain concepts for people to refer to when they create (or transfer) their own. Ideally only the owner (or alternativly its representative) should provide concepts. This bootstrap process is going to difficult (considering the large amounts of legacy data, but also deciding who is the 'representative' of a legacy concepts) but we hope by getting existing databases that contain original concepts (such as IPNI) in at the beginning we can create a basis that other people will use. --RobertKukla 6/9/2004

If IPNI is an intended original source, the ConceptSchema? could be considerably abbreviated, as IPNI basically contains Name, Author, Publication Reference, Date. It may also have up to three occurrences for the same name, one each for IK, Grey Cards, and APNI. Ostensibly all three are referencing the same name at the same publication & date. Is this what is being considered a concept? Would these IPNI records be 3 different concepts or 3 references to the same concept? --ChuckMiller? 10/9/2004

The majority of elements is optional, so if in particular cases no information is available they can be omitted. If by abbreviation you mean removing elements from the schema definition, we would consider this inappropriate as in some situations information is available. Currently we understand that IPNI may have mmultiple references to potentially the same original concepts and we would not want them to consider these as three different concepts. We would hope that IPNI resolves these duplicates of concepts to allow the use of their records as a basis for the original concepts. We realise that these are 'only' references to publications and that the whole definition for these concepts as defined in the publications is not currently held by IPNI. As far as we know they don't have any plans to store this information but it would improve the quality of the data. It is also possible that another provider could record this information by using IPNI's records as a source for bootstrapping the system. --JessieKennedy&RobertKukla 10/09/04