TCSchemaWiki - Home Page

POST TDWG2004 The LinneanCore is being actively discussed and developed as a schema for taxonomic names on its own WIKI.

Also see a post TDWG2004 Discussion here: TCSAndTheLinneanCore

GregorHagerdorn? 24/05/2004 15:18:

Thanks Jerry for advancing this! I fully agree with your position that the "taxon concept standard" should be a second layer that is based on a "taxon names standard".

Clearly, linking data to taxon concepts is much more powerful than linking them to nomenclatural name objects, but it is also several orders of magnitude more difficult. I can hire people to do the linking to names for our database containing 180000 plant-fungus relations on a names basis: that is 26430 different fungal and different 31166 plant names, and the linking to name-identifiers can partly be automated (with out current technique of generating and matching name variants, about 60% can be linked automatically). The rest is manual work, but requires only a degree of taxonomic understanding that can be achieved by training.

In contrast, to link the data to taxon concepts I would need taxonomic experts that evaluate the various concepts used in each publication (plus a single publication may implicitly use different concepts for different plant-pathogen combinations). So almost all 360 000 concepts in the database would have to be researched by true specialists - which I have no funding for.

Even though linking to names is less expressive, it is significant progress over not being able to relate to other name-based data at all! Using a default assumptions that various concepts are largely overlapping, I can then dynamically link into taxon concept databases. I can mark the data on the user-interface as being uncritical - and I can elevate a small amount of data to taxon concept level where problems are detected, or expertise is available.

I think this works only if taxon concepts offer a common language to be connected with taxon names. I may overlook something, but I think means that concepts use a names standard as their foundation (2 layers). Otherwise my name-based data (and data expressing only homotypic synonymy) can not link with higher-order synonymy and species circumscription concepts.

This is the basic problem I fear may arise with the development of the Taxon Names prepared by Jessie's subgroup. It primarily aims at Taxon concepts and is expected to be developed in a way that it can also express purely nomenclatural names data.

That seems to be a polymorphism, not a 2 layer design. It would support delivering both concept and name data collections to humans, but it would NOT necessarily allow to relate concepts and names in machine processes. I believe that the second is much more important than the first.

Can someone from the TaxonNames? TDWG subgroup deliver a draft explanation how this use case works with the current architecture if TaxonConcepts? are not a higher layer that is based on a common TaxonNames? standard? I will be happy to be convinced that my linking use case with the data, as sketched above, will work and that concepts can be expressed in terms of references to name objects curated by nomenclature databases.

Karen says: " But, stepping back a bit, as you say, without a solid foundation of nomenclatural data associated with names, the subsequent 'packaging' of those names into taxon concepts will be severely limited. For this reason, I would have assumed that the 'Names Standard' work by Jessie Kennedy et al. would be taking nomenclatural matters into consideration. "

I think this is exactly basing concepts on names, not offering alternative names and concepts expressions.

GregorHagerdorn? Tue 25/05/2004 23:28:

(Here are some comments on specifics of the schema. This does not mean I have an overview of the entire schema... I am very interested in the further development of LC. I believe a nomenclatural standard is needed and should become an integral part of the TDWG/GBIF infrastucture - whether this is by basing taxon concepts optionally on LinneanCore, or whether the results of the LinneanCore discussion are fully integrated into a common TaxonNames? and TaxonConcepts? standard as developed by the TDWG TaxonNames? subgroup.)

---

The existence of elements called "LocalKey?" remains unclear to me. What is a local key and why is it expressed in an exchange standard? Or: what is the scope of local: database, xml-document? Will the key be unchanging over time or not?

The possibility of using LSIDs should be explored. This is a problem for all data standards - and for GBIF. At the moment ABCD units have a key made from 3 data elements, SDD has a Globally unique ProjectID plus Project-local integer IDs for all objects (similar to OWL IDs). LSIDs seem to be attractive, since they are immutable, location independent, but also resolvable. We need to resolve the object ID problem in the next weeks, so any input is very important to all TDWG subgroups. See http://efgblade.cs.umb.edu/twiki/bin/view/SDD/ProxyDataModel for a beginning of a discussion.

---

"ProtonymURI/LocalKey?": It is unclear to me why this is not already covered by the combination of a name object ID plus cited reference ID?

ThisNameURI and ThisNameLocalKey? should just be NameURI and NameLocalKey?. I think the reflective "This" is superfluous and confusing because it seems to indicate a change in scope (else all elements should be prefixed with "This"!).

---

Publications: I fully agree with Jerry's comments on the need for a common type. I believe that a structure would be desirable that either provide the information itself, or provide it by reference to an external provider. Again, SDD uses proxy types to abstract from the actual methods of linking (URL, LSID, Webservice, etc). At the Berlin meeting last week ABCD, SDD, and TaxonNames? agreed to try to make this a fundamental common infrastructure feature.

Based on Jerry's LinneanCore SimpleCitation? proposal I made an attempt to slightly restructure it, base it on the Proxy proposal, and add a few missing elements. Please take a look at it at (http://efgblade.cs.umb.edu/twiki/bin/view/SDD/ProxyDataPublicationPro xy) and critize heavily - preferably by using the WIKI! The SDD starting page (http://efgblade.cs.umb.edu/twiki/bin/view/SDD/WebHome) tells you how to register on the WIKI. You can, but don't have to add yourself to the email notification list. Especially as outsiders, you may not want to do this, just use Changes to see recent changes. Testing the WIKI may be useful to help ECAT decide whether ECAT should use a WIKI or not!

Also to be found on the WIKI ProxyDataPublicationProxy? page: a special version of SDD only containing the publication proxy, and a link leading to ProxyDataPublicationNotesOnLinnCoreSimpleCitation? (detailed annotations of Jerry's model).

---

A major point is that I believe NameRecordType?/Hierarchy is misconceived (or perhaps I don't understand it). Clearly, there is a need for a "ParentName?" if the Canonical names contains only the monomial, or the last epithet in the name components. However, this hierarchy must stop at the top = Genus. The proposed FlatHierarchy? does not show this, to me it seems to be fully developed taxonomic CONCEPT hierarchy.

In addition I think that an: "OriginalTaxonomicPlacement?" = "Unconstrained expression of the taxonomic placement by the original author. Example: "Algae" for a name whose type is later to be discovered a fungus." may be helpful in many circumstances. This could, however, also be expressed in a more general commenting mechanism.

---

DonaldHobern Thu 03/06/2004 11:55:

I believe you have seen the document and schema that Jerry Cooper has put together under the title “Linnaean Core” (otherwise please see the attachments). This was something Jerry and others suggested in Oaxaca as a way to start sharing some name data with GBIF quickly in the period while we are still awaiting a TDWG names/concepts standard and some associated protocol. The purpose he had in mind was to produce a flat structure which could be shared using existing DiGIR software. This not intended as a rival activity to what you have been doing. From the Edinburgh meeting my personal feeling is that we are progressing fast enough that we may be able to avoid using an interim standard. I know however that at very least this piece of work has helped Jerry and Gregor to crystallise a view of something which does worry them about the Napier schema today. My perception is that this is a misunderstanding of how the schema is fitting together in at least two areas:

The first of these is that your schema intends all names to be modelled as separate TaxonConcept? elements linked by Relationships referring other TaxonConcept? elements. To be fair, Jerry recognises this but feels that we are burying some absolute facts about names among a set of other relationships which represent taxonomic judgments. I believe however that we should be able to resolve these concerns if we develop a good set of values for the Relationships/TaxonConcept?/@type attribute and define the inference relationships for these different values. As this was a general point that was made at Edinburgh I think we are all agreed that this needs addressing.

I think that the second issue is that Jerry may have been misled by the element name “TaxonConcept?”. An element of this type containing only the nomenclatural data for a name without any circumscription elements seems to me to correspond precisely to a name record of the kind which Jerry wants to model. All of the real “concept” parts are additional content to the nomenclatural data. My view is that your TaxonConcept? element could be renamed without much loss to “NameUsage?” and then become much more acceptable to Jerry. (I’m not saying that we should necessarily do this – just that we need to make sure that objections are over something substantive rather than just names). In short I suspect that the issue boils down to Jerry expecting a concept document to look something like:

(where the name is a kernel that gets embedded as a label inside a concept element) rather than:

(where a name element includes an optional conceptual payload). I believe that the second of these is what you have modelled and that this is actually the more flexible model (since it allows all lists to belong to the same schema).

I have asked Jerry for more clarification of his views (because I do not wish to misrepresent him). There are also several other nomenclatural elements in his schema which may need consideration for how we include them, either in a revised version of the ABCD TaxonName? element or within your Name wrapper element. I hope to spend a little time looking at some of his more specifically nomenclatural elements to see whether there is anything that is not covered by well-defined Relationships/TaxonConcept? elements and which we should therefore consider including.

It would be really useful to get the latest working draft on the web for comments. Do you have a plan to do this? If not, would you be happy for me now to load it up into our CIRCA server and to open a mailing list for comments? I would place it into a public area of CIRCA so that there would be a URL for download even for people without CIRCA ids.

While I was typing paragraph 1 above, I went back to check element names in the schema and again got really confused by the way that the current version I have uses the different TaxonConcept? elements. I would expect the Relationships element to contain a set of Relationship elements each of which held a type attribute and a ref attribute.

One thing which Jerry did include in his Linnaean Core schema was a SimpleBioStatusType? element with elements for identifying a geographic region and whether the taxon is endemic, indigenous, extinct, etc. I pushed back on modelling this in a name schema in this way because it seemed to represent a large extension into modelling extra information. On the other hand these are precisely the elements which appear in many checklist data sets. It would be good from my standpoint to make sure that we can model these using a single schema. This may mean that I end up advocating something rather like the Payload element you added for me and which I said I did not want. On the other hand it would be even better to model an extra (optional) element which can appear inside a TaxonConcept? to allow checklist produces to share this information in a standard way (e.g. using TDWG region codes). I am particularly keen to think this through since I was talking yesterday to Frank Bisby who mentioned some ways in which the presence of synthesised distribution data has made ILDIS much more useful than it would otherwise be. I’m not really asking you to do anything about this now, just to be aware that I may come along in a few weeks with a suggested additional element to be included in the schema.

GregorHagerdorn?:

* The first of these is that your schema intends all names to be modelled as separate TaxonConcept? elements linked by Relationships referring other TaxonConcept? elements. To be fair, Jerry recognises this but feels that we are burying some absolute facts about names among a set of other relationships which represent taxonomic judgments. I believe however that we should be able to resolve these concerns if we develop a good set of values for the Relationships/TaxonConcept?/@type attribute and define the inference relationships for these different values. As this was a general point that was made at Edinburgh I think we are all agreed that this needs addressing.

Contrary to my signature I am aware I may be wrong, but my analysis is that the type is in the object, not in the relationship. Also I think there is a validation issue: setting a type on the relation seems the wrong way to do it, if really you can validate it by knowing wich elements have been used.

I would propose the Taxon standard to be:

TaxonItem? = abstract base type TaxonName? based on TaxonItem? = validated to ONLY contain nomenclatural data. TaxonConcept? based on TaxonItem? = a composition of Concept/Opnion knowledge plus a composition to contain the name object

This allows you to define relations or trees based on TaxonItem? base type if so desired, but xsi:type could define which type is used.

---

Regarding my phone conversation with you Jessie: I agree that any publication of a newly created name publishes a name object plus a concept.

However, these parts are used independently.

* I think that the second issue is that Jerry may have been misled by the element name "TaxonConcept?". An element of this type containing only the nomenclatural data for a name without any circumscription elements seems to me to correspond precisely to a name record of the kind which Jerry wants to model. All of the real "concept" parts are additional content to the nomenclatural data.

I agree, but I think it should be validated by having two separate derived types, see above

My view is that your TaxonConcept? element could be renamed without much loss to "NameUsage?" and then become much more acceptable to Jerry.

In short I suspect that the issue boils down to Jerry expecting a

concept

document to look something like:

<conceptList><concept><reusableNameElement/></concept></conceptList>

(where the name is a kernel that gets embedded as a label inside a concept element) rather than:

<conceptList><name><optionalConceptElement/></name></conceptList>

(where a name element includes an optional conceptual payload). I believe that the second of these is what you have modelled and that this is actually the more flexible model (since it allows all lists to belong to the same schema).

Good point. I argue for the former model. There is nothing identical in the <optionalConceptElement/> part, but a lot in the fundament nomenclatural object part.

Use case within the TaxonName?/Concept model: if I have a recombination (name into new genus) or a nomen novum creation, I refer only to the name object, not to its concept part.

Use case of external use of taxon names data: If I have data from literature, or a specimen label, I need to be able to refer either to the name part, saying: "I can recognize that this is the name that was meant, even though here the authors are abbreviated differently, and there seems to be a typo in name/date etc." - OR - to the concept part, saying: "I believe the concept used here on the label is indeed the one published in the original publication of the name".

---

In the current Taxon Name concept structure, to me this seems to imply that if 3 publications have different circumscriptions of a name, I have to denormalize all nomenclatural data 4 (sic!) times:

ID=1 meaning only the name Xandra alba

Original author 1 Combining Author2 ex ValidatingAuthor?, nom. nov. for Zandra alba; published in JournalName? 8 (3): 123 (1885); true publication date: 1.5.1888 Validating publication: sdfjkfdsjlk

ID=2 meaning only original concept Xandra alba

Original author 1 Combining Author2 ex ValidatingAuthor?, nom. nov. for Zandra alba; published in JournalName? 8 (3): 123 (1885); true publication date: 1.5.1888 Validating publication: sdfjkfdsjlk

protologue description published in JournalName? 8 (3): 123 (1885);

ID=3 meaning new concept 1 Xandra alba

Original author 1 Combining Author2 ex ValidatingAuthor?, nom. nov. for Zandra alba; published in JournalName? 8 (3): 123 (1885); true publication date: 1.5.1888 Validating publication: sdfjkfdsjlk

new description
new synonymy
published in JournalName1 111: 3 (1991);

ID=4 meaning new concept 2 Xandra alba

Original author 1 Combining Author2 ex ValidatingAuthor?, nom. nov. for Zandra alba; published in JournalName? 8 (3): 123 (1885); true publication date: 1.5.1888 Validating publication: sdfjkfdsjlk

new description
new synonymy
published in JournalName2 222: 3 (1992);

To me this seems to be a case of strong denormalization.

Compare the case of my proposed model for concepts derived from base type and using composition or reference within the concept outer shell:

<TaxonName? key=1>

Xandra alba

Original author 1 Combining Author2 ex ValidatingAuthor?, nom. nov. for Zandra alba; published in JournalName? 8 (3): 123 (1885); true publication date: 1.5.1888 Validating publication: sdfjkfdsjlk

<TaxonConcept? key=2>

and either embedded (leaving it to the consumer to detect the denormalization, but providing for

detection by giving a key!)

<TaxonName? key=1>

Xandra alba

Original author 1 Combining Author2 ex ValidatingAuthor?, nom. nov. for Zandra alba; published in JournalName? 8 (3): 123 (1885); true publication date: 1.5.1888 Validating publication: sdfjkfdsjlk

<TaxonName? ref=1>

protologue description published in JournalName? 8 (3): 123 (1885);

<TaxonConcept? key=3>

meaning new concept 1 <TaxonName? ref=1>

new description
new synonymy
published in JournalName1 111: 3 (1991);

<TaxonConcept? key=4>

meaning new concept 1 <TaxonName? ref=1>

new description
new synonymy
published in JournalName2 222: 3 (1992);

4. One thing which Jerry did include in his Linnaean Core schema was a SimpleBioStatusType? element with elements for identifying a

I agree this should be separate from LinneanCore, but combinable.

Linnaean Core Discussion