Using LinneanCore representations of a 'Name' in the Taxon Concept Schema

Background

The TCS is designed as a neutral exchange schema that aims to be able to represent taxonomic concepts (and their definitions and relationships) as defined by taxonomic data providers. In essence a Taxon Concept is minimally a full Scientific Name, together with a recorded authority for a particular definition of the concept (According To / sensu / sec.). (Components of the definition may be explicitly recorded in the TCS, be found in the According To citation, or merely be implicit). For ‘Original Concepts’ this According To Authority will generally match the Authorship included in the original first use of this scientific name (i.e. for the basionym/protonym). Although ‘poorly defined’ as concepts, it will be possible to represent ‘bare names’ as ‘Nominal Concepts’; which comprise only the scientific name in the absence of an According To Authority.

** if you are publishing a new concept using the name "Silene alba" that is 'AccordingTo' you.. and your definition can refernce the previous one (pro parte synonym...). Otherwise you are just using the earlier definition, with a statement saying that you are only using it proparte. It is up to you as a taxonomist to decide which of these you are doing, and up to later taxonomiststo decide whether they want to use the ealier or your concept defintion. (NB the string "Silene alba p.p." on its own seems pretty meaningless - as it does not reference whose concept of the taxon you are asserting p.p.) TrevorPaterson 13Dec2004

All taxonomic and nomenclatural relationships have been modelled as between Taxonomic Concepts (which of course always include a Name), as we believe that the vast majority of meaningful relationships are in fact between one concept represented by a name and another. This model allows (and arguably requires) that all relationships between names are in fact expressed as between concepts. It is possible to represent what might be considered as purely nomenclatural relationships and information in this manner. Nomenclaturists wish to capture the history of a ‘Name’, in the TCS model this would be done by creating a series of Taxon Concepts the relate back (recursively) to an original concept. Thus to record the basionym for a currently ‘valid’ name, a concept containing the current name could be related to a concept containing the basionym via a ‘has basionym’ relationship. This one concept could therefore be recorded as the basionym for many alternative Taxon Concepts.

There are many clear advantages to working biologists and taxonomists in moving towards a Concept based model of taxonomy, because this allows explicit recording and resolution of what taxon is actually being recorded or observed. However it does require a separation of Scientific Names from Relationships.

As currently modelled TCS therefore desires a representation of Names that excludes any incorporation of relationships as part of the Name. Original Prototypes of the TCS Schema included the TDWG <nop>ABCDSchema ‘Name Atomized/Detailed fields to hold components of Full Scientific Names (Latin Mono/Binomial, Original Author, Date etc). Many felt that the division of Name types according to Code/Kingdom in this Schema was clumsy and unnecessary, and could be unified. It was also argued that not every part of a name could be represented in this ABCD schema fragment. An alternative Name Schema (LinneanCore) is now being developed through TDWG by Nomenclaturists who desire full representation of names.

Undoubtedly the TCS will benefit from the LinneanCore Development work, ideally by being able to include the Linnean Core representation of Names in place of The ABCD Name Detailed placeholder, but also from the work being performed to enumerate relationship types, variants of name structures etc, even if the complete LinneanCore schema cannot be fitted into the current TCS model.

In the report below we consider some of the aspects of the current development versions of LinneanCore, particularly how they relate to the TCS model and the separation of Names from relationships, which underpins the representation of Taxonomic Concepts.


1. Inclusion Of <LC:ScientificName> under <TCS:Name>

a. <TCS:NameSimple>

LC1.5 suggests <TCS:NameSimple> becomes <TCS:NameLiteral>, representing the orthography of the name in this According To citation, but not necessarily the same as that calculated from NameDetailed. This differs from the function desired in TCS: that NameSimple represents the full scientific name as concatenated from separate fields in the According To datasource/NameDetailed, or where this is not possible or the separate name fields are not provided, the string representation provided in the data source (e.g. Aus bus (L.) Smith 1999). TCS would not wish to provide a different name here than what could be calculated form the NameDetailed fields. In particular we would not wish the NameSimple to diverge over time from a changing NameDetailed. We would see both of these elements as being fixed by the According To Authority; later alterations to the name would be recorded in subsequent Taxon Concepts and related back to this concept.

** If you are 'corecting' or 'normalising' the name we would record this as making a revision concept - which would be quite simple - it would contain the new name, the relationship 'valid name for' (old concept) and an 'according to' A.Scrutineer, 2004. TrevorPaterson 13Dec2004

b. Choice between <LC:ScientificName> and <LC:VernacularName>

We favour a <NameDetailed> container, and we are Not providing a means to atomize vernacular names. They would be captured as strings in NameSimple (and an attribute of <Name> would flag scientific versus non-scientific Names.)


2. Components of the <LC:ScientificName> element

a. <LC:Label>

This was called FullName up to LC1.4, and is similar to what TCS requires as NameSimple – i.e. the calculated, concatenated full Scientific Name – But again for a TCS concept this would be fixed for one concept/According To. NameSimple is required in TCS, and should represent the name as present in the AccordingTo publication, if this is not available we would replace it with what could be calculated from the atomized NameDetailed representation. Therefore LC:Label may be different from TCS:NameSimple, as it should always be calculated from the canonical information. TCS could therefore benefit from having both NameSimple and Label.

** I think you are wanting a world where there is only one concept/name for each 'notion' of a taxon - and you would have a label for this, correct according to the current rules - which might also contain information about earlier, and alternative variants. We represent all these alternative name variants as (components of) concepts in their own right. e.g if Linneaus publishes a name/concept in 1753 - that is one (original) concept, then A.scrutineer decides there is some grammatical or typographic error to be 'corrected' in 2004 - we record this as a separate revision concept pointing at the original one. I think you want to somehow change or replace Linneaus' concept according your current view on things? We see linneaus' original concept as unalterred - you can track more recent concepts that refer to his, for example you could look for relationships 'valid name for' this concept to get a 'valid name' (though of course there may be several different ones over time...). TrevorPaterson 13Dec2004

Gregor’s notes suggest that Label + ID defines a Name, all other information is optional. Where/What is the ID? Is the name fixed; can there be multiple Name instances with the same Label? Is there one ID for one Label? Can the Label associated with an ID change over time (with changes to the code, changes in status according to the code etc.? These issues make the inclusion of this model for an LC Name problematic if included as a stable, immutable component of Taxon Concept.

** We really dont see names as existing separately from concepts, or being merely address labels for concepts....again we think (name+definition+accordingto) = concept (with an ID). Your model above seems to be: NameA, NameB, and/or NameC:- point to ID for a 'taxon'. What is this ID/Taxon ?- especially in Nomenclaturist world which may lack Definitions and AccordingTos?? TrevorPaterson 13Dec2004

b. <LC:CanonicalName>

This is the meat and the gravy of how a Name is represented, and in version 1.5 looks like it is an elegant XML structure to contain all possible formats. Is this true? We are a bit confused where the suffixes subspp., var. etc. would go. Are they implicit – to be added when the name is calculated?, reliant on a programmatic algorithm dependant on the the rank and kingdom details. Or, are var. subSp. etc NOT part of the calculated name in Label?

(We would not favour relying on users applying a calculated algorithm to generate Label....)

** We think any algorithms required to parse a NameDetailed representation should be as simple as possible, and that no algorithm should be required from a user: which is why in TCS Name Simple contains the string representation of the Full sciintific name which would be identical to that generated from NameDetailed - so any user can recover this with no algorithmic processing. Obviously the original parsing of the name string in a publication to NameDetailed or CanonicalName involves an algorithm, so must require the reverse algorithm to get it back.

c. <LC:CanonicalAuthorship>

This is composed of Protonynm and (optionally)NewCombination Authorship. These two elements seem to go beyond ‘Authorship’ in providing full reference information. The level of detail included resembles an AccordingTo for a TCS Taxon Concept; i.e. there are name strings and a citation. This contrasts with ABCD, which does not attempt to identify (atomize) separate authors nor represent the citation. ProtonymAuthorship therefore overlaps (or conflicts) with the representation of an ‘original concept’ in TCS. Unlike the case with TCS AccordingTo Authors, and ABCD Author Strings, LC Authors are atomized, and may even reference an Agent ID where available. This atomization would seem to be a somewhat artificial tokenization if it is used to store ‘ex’ ‘:’ etc.

An explicit statement of the Year of publication seems to be lacking and would require to be recovered from whatever citation representation is chosen.

d. <LC:NameExtensions>

Seems a bit untidy, doesn't overlap fully with what is held in ABCD (Breed, Trade Name, Individual.). Inclusion of *<ConceptExtension> allows the Name Schema to now represent Concepts

e. <Rank>

Enumerated attributes seem very complicated..... May indicate that TCS has to improve its representation of Rank!

We would consider Rank to be part of a classification, and part of the information comprising a Taxon Concept, not part of the name.

** how is it not a change in classification? - the first person classified this taxon as Rank A, the second person accepts the first taxon concept - but assert that it is at rank B. The genus classification of a species is explicit in the binomial name - ie definitely where taxonomic classification and Nomenclature coincide! TrevorPaterson 13Dec2004

f. <Protonym> (and consider <Teleomorph>, <IsNovum>)

LinneanCore is proposing to treat as attributes information that TCS would represent as Relationships to other Concepts. i.e. it is in fact implementing a similar model of a name as a concept that is related to other names (concepts). The protonym would be represented by a GUID of another Name instance, therefore what is the relationship/stability between IDs and Names? (see 2a above).

g. <Nomenclature>

(Would agree with RPs comment that Code is required not optional – this is equivalent to recording the kingdom in TCS, and is also implicitly represented in ABCD by the choice of name structure used. It may make more sense for TCS to record Code rather than Kingdom, or possibly both items?).

We may be misinterpreting the XML structure intended here - we are reading this as meaning that a sequence of separate statement/observations can be recorded against one Name object.....

Recording the Status of a Name is a historically subjective matter, and is unstable. If a Name is recorded as ‘Rejected’ this reflects a decision taken by a later author on an earlier Name; i.e. presumably the original creator of the name considered it ‘accepted’. Does the same name now have multiple records, marked up by different authors at one time, do these have separate identities, or is all this (contradictory) information stored in one name object (as is allowed by this being an unbounded sequence). In TCS tracking these ‘versions’ of names is done by referencing back from one concept to an earlier one – with a relationship such as ‘replaces’. Tracking (as done in LC) would seem to be more complicated if all the information is stored as one XML sequence in one Name object and there are no specific pointers between different concept/names objects.. Recording BlockingNames again represents relationships sensu TCS as attributes of a Name. NomenclaturalSources is now providing an AccordingTo authorship for each of these Nomenclatural status remarks (which would be represented as separate but related concepts in TCS). (Will the ‘simple citation’ type used here be superseded by ‘Alexandrian Core, or a reference to a publication etc...)

h. <EditorialStatus>

This might be useful as a quality statement about data – but probably should go in Metadata. Seems over detailed – with citation sources etc.

i. <TypeSpecimen>

In agreement with RP, we would be happy for this to be considered part of the TCS concept definition…

j. <CurrentFamilyPlacement>, <CurrentAcceptedName>

Are these just provider-specific cheat fields? Should a common schema provide provider-specific elements? – could be allowed by providing an extension mechanism…. In TCS these could be represented by simple relationships to other concepts.