There is a widespread view that the community require a transfer schema to allow the exchange of taxon concepts which was confirmed at TDWG Oct. 2004 and in following discussions on relevant Wikis and mailing lists. Part of the design process for the taxon concept schema involved considering modelling biological names. The TCS adopted the view that names are a part of a taxon concept and do not exist independently. A name may have been created independent of any taxonomic judgement, however that name is still transitively dependent on a concept. It has been argued however that names should be objects in their own right with different views on the complexity required for a model of names.

Following discussion about Name Objects on the mailing list and in order to reach some consensus we have a developed a series of VotingDraft versions of the schema from the VotingDraft (TCS_0.95) submitted to TDWG in March 2005 incorporating Linean Core elements into the Taxon Concepts through to VotingDraft2 (TCS_0.95.2) with Names as top level elements taking most of the design for the Name element from work of the Linnean Core sub-group. This schema allows users to pass around names without concepts but still requires users to use concepts when appropriate for example, identifying specimens and still allows for the idea of nominal concepts where the exact definition of the concept isn't clear in for example legacy data. This version has undergone some refinements and has now been presented as the VotingDraftFinal version of TCS - TCS 1.0 to be presented at TDWG2005 for voting.

History:

The goal of TCS was to develop a schema that dealt with taxonomic names and concepts. The approach taken was to be as inclusive as possible with regards to how different user-groups used names and concepts in their taxonomic databases so they could exchange data. This approach was encouraged by Stan Blum after he heard a talk to the SEEK group about trying to model the different views people seem to have about what a concept is or isn’t and how for example ecologists or other biologists and users refer to taxa in identifications, surveys etc. We explicitly did not set out to say that a name or a concept must have x, y and z, nor did we set out to model the codes of nomenclature or the process of taxonomy, therefore we did not plan to define a model of names/concepts specifically for nomenclaturalists or taxonomists or ecologists or whomever. Rather, our approach was generalist, i.e. to provide a mechanism whereby any data provider or user could take the schema and map their data to it to allow exchange. We tried not to go for the common denominator approach and only include those elements everyone had in their database (there would be none!) but to allow optionality in the schema elements. So if you believed a concept must have certain elements and that they must be combined according to certain rules you can represent your data – but not necessarily the rules about how you got generated it. Whereas if you had some pretty sparse data but you thought it was important you could still use the schema to represent and exchange it. The solution was destined not to map directly to any one user’s database model nor view of what concepts or names were but was designed as a framework onto which all views could be mapped.

Given the diversity of existing data models which have all been designed for a particular purpose there was no clear schema which could be chosen as a starting point. So instead we tried to raise the level of abstraction of the entities that we were considering in order to find some commonality across the different models and requirements. The most common feature was that people use “names” to communicate about things. In general they were either

So our model started from the basic premise that we needed to model names and definitions which we called TaxonConcepts, a TaxonConcept being a single name and a definition. However, what one person thinks of as a name is different to another (usually depending on the purpose of the communication and on the required precision). “Names” can have several components and the particular components constituting any specific name may vary for many reasons. Additionally there may be differences within any one component of a name due to optional variation in the representation e.g. whether or not abbreviations or full names are used or due to misspellings. Therefore when a name I used to imply a concept the name given can be more or less precise in determining the definition or in other words uniquely identifying the TaxonConcept.

• TaxonConceptName. (Aus bus L. sec. Archer 1969) identifier for a concept • NomenclaturalCodeCompliantName – identifier of a type specimen

• Latin Name - imprecise

• Common Name (daisy) – more imprecise • Temporary name (Aus sp1.) – even more imprecise

Names don’t define concepts, they simply allow us to more or less precisely identify concepts implying there is a definition somewhere. So to include, where available, the definition of the name we looked at how people record concept definitions. Again there was enormous variety of possible components used depending on the method of working so we opted for an optional structure allowing any combination of the sub-components to be used.

TCS wanted to support any of these types of names with whatever definition of the concept was available. However we knew that depending on the source of the data we were more or less likely to get a particular interpretation of what a concept was which we modelled by providing a type attribute to indicate the type of concept. This could be decided by the provider to aid in interpreting the data (or potentially by the receiver when comparing data).

In addition we were very much in favour of trying to get leverage from the other schemas being developed for TDWG, so where we had a character circumscription element in the definition of a TaxonConcept – which basically described the “characters” of the taxon, then we expected to be able to include SDD or some appropriate subset of SDD. Where we had vouchers we expected to include ABCD or the appropriate subset. Where we had publications we would include the citation part of a publication schema, etc. Likewise we expected that for example, ABCD would include at an appropriate place a reference to a TaxonConcept.

TCS was presented at TDWG October 2004 to mixed reception. Our generalist approach seemed to cause problem in communication. As people tried to map their understanding of terms used in the schema directly to their own which meant we had difficulty communicating our ideas and people couldn’t communicate with each other without causing confusion to themselves or others they talked to! So at one level we got it right – they were all starting to talk about the same abstract things but they put to specific semantics onto the terms and all of a sudden found they weren’t talking about the same thing at some point. The prime example being TaxonConcept – we meant by this a name+definition. Many people refused to accept this and continually interpreted it TaxonConcept to be their understanding of concept which is TCS understanding of definition.

For example, nomenclaturalists, a particular type of user of taxonomic names decided TCS didn’t meet their needs because the Name element, which is the primary entity in their perspective was not a complex entity that allowed them to represent all information about names they saw as being important. They interpreted the Name element to be the only way to store information about names. The Name element was taken from the ABCD schema as a relatively simple type to represent names as written with options to suit the different nomenclatural codes. This contained no additional elements for information about relationships between names, to type specimens or to publications. It was what was required by ABCD, (although we believe a TaxonConceptName is more appropriate), and is similar to what we believed was required by SDD. Neither ABCD nor SDD wanted to incorporate all of TCS into their schema but rather wanted some human readable reference to an implied concept.

What TCS required of the nomenclaturalists was to forgo a precise model of the rules specified in the Codes of Nomenclature regarding names but to ensure that they could represent all the data required to follow the rules to undertake their work and capture data resulting from their work using the schema. It would never look like it had been tailored for them unless we defined a name schema specifically for nomenclaturalists and they would most likely never use the entire schema. In order to represent their data they had to accept the premise that TaxonConcepts as designed could have only one name, which meant that they could basically use the TaxonConcept to represent a Name and thereby the complexity they require for names. Nomenclaturalists want to model the basic Name representation (similar in principal to the ABCD Name element although this could be modelled differently), the relationships between names, the type specimens for names and the publication in which the name was published. We believe the elements to do this are already in the TaxonConcept. The terminology used in TCS might not be equivalent to theirs, however there is no agreed terminology for the entities and attributes being modelled within the Nomenclaturalists never mind across data modellers of taxonomic information.

Additionally the modelling style adopted by TCS could be different. For example, references to between names or concepts can be modelled as attributes or relationships of a particular type – there are pros and cons of both mainly computational. If a conceptual model has several a 1:1 relationships (has_x, has_y, has_z) between two entities (A, B), of the same kind. Then this can be modelled either by giving the entity the attributes has_x, has_y, has_z into which identifiers for the related entity can be entered or by having a separate relationships table containing the identifiers for the 2 entities and a relationship_type attribute specifying if the relationship is of type has_x, has_y or has_z. This is purely a modelling decision and nothing to do with whether or not the information can be represented. If the domain is well understood, there is agreement on what attributes are required for an entity and the majority of entities have all of the relationships then the former approach is reasonable. However, when the entities in the domain are variable in constitution for whatever reason and it is difficult to ensure all relationship type are modelled explicitly then the latter approach is more flexible i.e. it is more of a generalist approach and is that take bye the TCS. It is easier to change the possible values for an attribute that change the schema.

If the schema was modelled specifically to suit the nomenclaturalists then it would be biased to their view of the world and wouldn’t necessarily suit other users. Therefore, we would need name/concept schemas for ecologists, taxonomists (working primarily on specimens who see their specimens as the primary definition), taxonomists (who relate existing taxa to each other and focus on characters without much attention to the specimens actually used), list providers, museum curators etc. We would then have the problem of mapping between these schemas and the potential problem of some users not being able to develop their data resources until other users had developed their databases. Therefore we tried to keep the approach general while addressing the concerns of the nomenclaturalists.

There were many discussions as to the pros and cons of having names as concepts and whether or not people were prepared to accept that for data exchange we can in fact have a general schema which could be of advantage to the community at large although not so tailored to any individual user group. Much of the discussion between the nomenclaturalists and TCS, was in clarifying the codes rather than modelling the data resulting from the application of the codes. Instead of trying to capture all of the data with the elements provided, the discussion has come down to what the names of the elements mean and whether or not they agree with the codes.

Following discussion about NameObjects? on the mailing list we decided to present a version of the schema with Names as top level elements. This allows users to pass around names but still requires users to use concepts when for example, identifying specimens and still allows for the idea of nominal concepts where the exact definition of the concept isn't clear. See VotingDraft2 . This version has been modified and has resulted in version 1.0 to be presented for voting at TDWG 2005.

JessieKennedy 1/7/2005: