What is the difference between two "versions" of the same concept, vs. two separate concepts?
In my mind, a concept is a fixed entity, that does not change over time -- it was born with a certain scope, and maintains that scope into perpetuity. If the same author(s) of the original concept subsequently invoke essentially the same concept, but with a slight variation, my feeling is that it should be treated as a different concept (to whatever extent the slight variation is demonstrable). In other words, if it's different enough that it needs to be tracked by a different version number, why isn't it different enough that it should be tracked as a separate concept?
Maybe a specific set of examples would be helpful?
The comment is simply a wholehearted endorsement of your recommendation for a single "taxaserver.org" domain for the SEEK LSIDs. The two biggest reasons I would favor this approach are: 1) politics (the single largest reason why what SEEK is trying to do hasn't already been done, in my opinion), and 2) the less information you embed within the ID number, the better (again, in my opinion).
"different." In a shallow sense, the subsequently invoked concept is new just by virtue of having a different time stamp. In a deep sense, the added-on information may or may not represent something that the author "came up with" and that wasn't there before.
I guess what I meant by "new" and "different", was the implied scope of organisms that would be included within a given concept. A simple datestap would not affect the implied scope of organisms included within the concept. If the added-on information you allude to changes the scope of organisms that would be included within the concept, then it seems to me that a new concept should be defined. If the added-on information only clarifies the boundaries of what is the same scope of organism, then it seems to me to simply be a reference back to the original concept (not a new "version" of the concept).
original concept package? Surely there's a sense of "newness" here, i.e. the new recognition of an earlier typo.
RichardPyle: Yes, but does the alteration/correction of what really amounts to metadata for the concept really require that a new ID be issued (in this case, a new ID assigned to a different version of the same concept)? It seems to me that the whole purpose of creating a GUID for concepts is to avoid the need to track such trivial changes in metadata. For example, if my dataset pointed to an LSID to indicate a particular concept, it wouldn't matter whether the author of a species epithet used to represent that concept was spelled "Lacepede", "Lacepede", or "Lacepede" (unless that name was embedded as part of the GUID -- which is an entirely different topic of discussion).
Maybe I'm misunderstanding the purpose of the GUID as used for taxonomic concepts?
NicoFranz: Still I'd say that's unrelated to taxonomy proper and would call for versioning of one and the same concept, even and particularly in the shallow sense.
RichardPyle: I guess my question is: are the sorts of metadata details at risk of needing correction (spelling errors, typos, etc.) the kinds of thing that need to be tracked via the GUID itself? In my mind, the GUID would represent a conceptual scope of organisms (i.e., circumscription); and therefore if the scope of organisms does not change, then no additional GUID is needed to represent a new "version" of the same concept (which is different from the situation where two separately-defined concepts may be deemed to be congruent).
NicoFranz: If on the other hand we have a statement in a different publication, at a different time, with the same or different circumscription content, there'd be a new (a least shallowly speaking) core entry in the database. That entry could be versioned too if it was transferred with unfortunate mistakes or incompleteness.
RichardPyle: O.K., if "version" simply means the correction of inadvertent, objectively-discernable errors in metadata, then my feeling is that there is really no need to track such metadata corrections within the body of the GUID itself. In the case you mention of a "potentially" different concept, then clearly this is a case of a separate concept GUID, which may or may not be secondarily mapped as "congruent" with the original GUID.
The key distinction here is whether the "later recognition" to change
something addresses taxonomic or more mechanical, string-transporting issues. In the later case, I believe we're tending towards versioning; in the former, separate concepts that should then be somehow related to each other.
<<<<<<< Your version RichardPyle: In summary, I guess my point is that, in my mind at least, the whole purpose of the GUID is to get away from having to track the mechanical, string-transporting issues and allow focus specifically on the circumscription (taxonomic) issues. Let the metadata be tied to the GUID at the central registry, and corrected as needed. There's no harm in preserving a log of all changes to metadata, but I don't see why such changes would cause the need for the generation of new GUIDs (in the form of a new "version" of the same concept).
Again -- it's very possible that I'm missing something here. But it just seems to me that if you're going to go to all the trouble to establish a GUID system, then the advantages of doing so (which, in my view, includes the alleviation of the need for everyone to indepentently keep track of various versions of metadata tied to each concept) should be maximized.
I'm a little concerned that I may not be making my point clear here, so let me know if anything doesn't make sense.
======= RichardPyle: In summary, I guess my point is that, in my mind at least, the whole purpose of the GUID is to get away from having to track the mechanical, string-transporting issues and allow focus specifically on the circumscription (taxonomic) issues. Let the metadata be tied to the GUID at the central registry, and corrected as needed. There's no harm in preserving a log of all changes to metadata, but I don't see why such changes would cause the need for the generation of new GUIDs (in the form of a new "version" of the same concept).
Again -- it's very possible that I'm missing something here. But it just seems to me that if you're going to go to all the trouble to establish a GUID system, then the advantages of doing so (which, in my view, includes the alleviation of the need for everyone to indepentently keep track of various versions of metadata tied to each concept) should be maximized.
I'm a little concerned that I may not be making my point clear here, so let me know if anything doesn't make sense.
NicoFranz: I do think there are some misunderstandings, though they can be resolved. First, perhaps unlike Taxonomer, SEEK is really a database of databases. We need to store - right next to the high-quality revision-derived concepts - versions of much coarser summaries across large groups (e.g. as maintained by ITIS). Only if we make the threshold for something to be a concept low - can we meet the challenge of connecting uses of names by entities like ITIS to the meanings of those names as specialists think about them. So there's the inflation-of-concepts step. Now the secondary inflation-reduction step is partly the responsibility of the GUIDs, and partly that of connecting concepts. But much of what we will have to do is driven by the availability of data. For example, if we don't allow ITIS to have GUIDs for their mostly entirely repetitive information, then we'll have an empty database. Demanding "only real differences in taxonomic opinion" to stand in the database and be referenced, won't work. One COULD choose to mark some identifiers of rather ambiguous, "shallow", or "pointing" concepts as unavailable to the public for referencing. Higher-quality concepts could have published GUIDs.
In short, I think that we (SEEK and you) share the same intuitions
about errors in metadata. But, realistically, we're not in a position to assign GUIDs to referenced names ONLY when we have reasons to believe that there's different taxonomic carving-up of the world implied. If we did that, we'd alienate some of our main providers of information. I personally think that the taxonomist's intuitions about what's really out there will be expressed mostly through concept relations, not through GUIDs. GUIDs will be neat for users, yet it's the connections among them is what taxonomists deal with. And solid relations can only exists between well-circumscribed (deeper) concepts.
RichardPyle: On this, I agree 100%!! But my question was really more about: How do you make the distinction between constructing a new GUID that represents a different "version" of the same "concept", vs. a new GUID that represents a different "concept". Clearly, if there need be more than one version of the same concept, then some difference has been identified. If version-differences are restricted to typographical sorts of issues of what otherwise is clearly the same concept entity (Name Sec Reference), then the question is about whether such sorts of issues need to be intentionally tracked by different GUIDs ("intentionally", to separate out the inevitable inadvertent duplicate entries).
However, if a "version" spans more than one "Name Sec Reference" instance, then it seems to me to be a potentially subjective decision as to whether we're talking about the a different version of the same concept, or two potentially different concepts.
I would've said that referring to the same name/reference/date intersection is the cut-off point for a GUID. That's the first of your two options. Though I'm not exactly sure how ITIS does and should do things. They're kind of silently revolutionizing taxonomy as we knew it and we are building to keep up (or so I feel sometimes). When ITIS puts out a new version of ITIS, en bloc and at a particular time, maybe that should really be handled as a mix of old and new GUIDs, depending on what (little, probably) has changed.
O.K., that makes sense. So, then the different "versions" would all be based on the same Name Sec Reference? But the reasons for the multiple versions would be...only typos & such? Or might they involve different ideas about the scope of the circumscription itself? Yeah....I'm not sure, exactly, how TSN's would map to SEEK GUID's. They're not exactly names, and they're not exactly concepts....
I guess that's why I'm not clear on what constitutes examples of "two versions of the same concept", vs. "two separate concepts". Perhaps some specific examples would help clarify?
For example, Allen et al. 1998 and Debelius et al. 2003 both treat the name "Paracentropyge" as a distinct genus from "Centropyge"; whereas Pyle 2003 treats Paracentropyge as a subgenus of Centropyge.
All three references include the same 3 species within Paracentropyge, so all 3 concepts of Paracentropyge are congruent. The first two references apply the name "Centropyge" to the same concept circumscription (i.e., exclusive of the 3 Paracentropyge species); whereas the third reference applies the name "Centropyge" to a broader concept circumscription (i.e., inclusive of the 3 Paracentropyge).
So, I see six distinct Concept GUID's here:
GUID Concpet Description
1 Centropyge Sec Allen et al. 1998 2 Paracentropyge Sec Allen et al. 1998 3 Centropyge Sec Debelius et al. 2003 4 Paracentropyge Sec Debelius et al. 2003 5 Centropyge Sec Pyle 2003 6 Paracentropyge Sec Pyle 2003
The congruencies among these would be:
1=3 2=4 2=6 4=6 5=(1+2) 5=(3+4) 1 excludes 2 3 excludes 4 5 includes 6
(...and other redundant/implied logical equivalencies)
So...would any of these represent "versions" of the same concept? For example, would anyone ever consider 3 to be a subsequent version of 1? Or 4 a subsequent version of 2?
six concepts - check!; the relations among them - check!; perfect example of how to do it right. Since the deeper intuitions about "how much is new" (e.g. nothing between 1 & 3) are captured in the synonymy relations (tapping myself on the shoulder here...), I see no need to merge them. After all one could (perhaps in other situations) disagree about the "=" judgment, or put things together differently. A merge of 1 & 3 into one GUID would put a lot of burden on the shoulders of that GUID, essentially the same burden that names couldn't lift in the first place.
RichardPyle: O.K., good -- then so far we seem to see things the same way!
Yes, exactly! The "=" is a third-party subjective interpretation, and the data model should treat it accordingly. At this point we're talking about relationships among different concepts; not versions of the same concept.
-- OR --
Suppose ITIS recorded the concepts of 5 and 6; and SP2K recorded the same concepts of 5 & 6, but mis-spelled the author's name as "Pile". Would those, then, represent different versions of the same concepts?
i.e.: Concept 5, Version ITIS | Concept 5, Version SP2K | etc.
NicoFranz: yes again. When the task is to refer to someone else's concept, which has a fixed extension in space and time (your 2003 paper), the implication is that there only exists one Pyle 2003 view. More or less faulty/complete representations of that view in different databases would be versions of the same concept. What you have here is only 1 concept/GUID, e.g. Centropyge whoever named it first? SEC. PYLE 2003. I think we don't want to assign GUIDs to versions. In Edinburgh Jessie made strong arguments for a 1-concept-1-GUID policy that allows versions to take care of sloppy transferring (who knows when we'll have Linnaeus' figures scanned and put into the database?). Your examples capture this very well.
RichardPyle: O.K., good -- that makes it clearer in my mind. So the argument, then, is whether or not different versions of the metadata associated with unambiguously (objectively) the same concept need he assignment of separate GUID's -- or whether that sort of metadata "versioning" should be accomodated in another way, and the GUID's defined directly as the circumscription.
O.K. -- that, then comes back to my original question. Different "versions" of documenting the details of Pyle's 2003 implied circumscription of the name Centropyge Kaup would appaently be assigned different GUIDs, according to Dave's last post. Or do I misunderstand?
I agree with Jessie on this. But...then why extend the GUID to include a version number?
RicharPyle?:
Before we delve into arguments about whether different versions need to be tracked by different GUIDs, I think it would be helpful to more clearly define the difference between two versions of the same concept, as opposed to two separate concepts.
DaveThau?:
Functionally, urn:lsid:taxaserver.org:3232:1 and urn:lsid:taxaserver.org:3232:2 are treated as different LSIDs.
RichardPyle: Right -- that's how I understood it.
DaveThau?:
Versioning gives concept authors the ability to simultaneously issue a new LSID to a concept which has changed, and provide a nice chain through which the evolution of their concept might be elucidated. The two versions of the concept would have different LSIDs, and systems using LSIDs would clearly differentiate between them.
So...is this analagous to the "Lineage" function of the original PEET model? Or is this a different sort of history tracking?
DaveThau?:
If we do use the version ability of LSIDs, I think only the 'owner' of an LSID should be able to create a new version of it.
RichardPyle: Hmmm...who would the "owner" be? The person/party who authored the publication in which it appeared (e.g., Pyle, 2003); or the person/party who captured it electronically (e.g., ITIS)?
DaveThau?:
And you're right, there are some issues that need resolving and you outline them well below
a. When is a concept a totally new concept, deserving a new GUID? b. When is a concept a version of an exisiting concept? c. When can you change a concept and not even give it a new version?
We talked a bit about these things in Edinburgh, but haven't set about coming to an agreement on them.
RichardPyle: Yes -- that's the sort of "hair-splitting" I was driving at. If, indeed, these questions merely related to implementation issues that can be resolved later, then I woun't dwell on them now. But I can't help be feel that they relate to the fundamental nature of what the GUID is intended to represent -- and it seems to me that that ought to be clarified fairly early on.
Again...if I'm jumping the gun on something, or if these issues are clear to everyone at SEEK, then I certainly don't want to disrupt anything. But for my own edification, I'd like to make sure I understand where SEEK is heading with these things, so I can maintain maximal congruency as I continue to develop our own system.
DaveThau?: Hmm... so is there consensus that we don't need the versioning information of LSIDs? That's fine with me!
Anyone feel strongly that we do need it?
(see discussion on Semantic content of GUIDs for background to DAve's summary decision)