It is important to get a sound and satisfactory model of Taxon Concepts without being influenced by implementational concerns. However various principles of information management should be borne in mind.
e.g.
- Duplication of Data; e.g. explicit recording of (denormalised) information without use of keys etc. Redundancy is inefficient and can be error prone, but can allow efficient short-cuts for queries etc
- Storing and reusing entities via IDs or keys is efficient for storage, but will need resolution on data retreival and query, and can cause problems when entering data (if this requires key lookup).
- Interoperable modules (e.g. schema used as XML:includes) will need to be carefully designed to ensure no invalid interdependancies, namespaces etc
- Connection to external resources will never be guaranteed. Gregor has proposed Proxy objects to hold both the resource connection details and a representation of the data - though not in the same structure as would be supplied by the resource. Should the data have the same structure? Is a simpler model preferred where the required resource datafields are represented, and a record of the resource connector/GUID etc to allow verification and subsequent lookup.