Meta-Record Database Scratchpad.

A scratchpad for database design, record relationship discovery scripting, and XML design for meta-records to index all related records of any type within a single meta-record.

Purpose.

Provide a functional ILS system to serve as a platform for experimenting with discovering, indexing, and using any record relationships. Provide an efficient design for basic ILS functions when most non-explicit record relationships have not yet been discovered or which may never be specified by choice or limitations of relationship discovery.

Design Functions.

  • Database design,
    • Individual record storage database for maintaining records as originally provided by various record sources for onging testing to improve relationship matching.
    • Meta-record storage.
  • Record relationship batch scripts design.
    • Batch scripts to analyse record relationships for filling meta-records.
      • (This is the difficult part. Many useful record relationships are not explicit and may be impossible to determine with perfect precision. The system must provide basic functionality without these relationship analysis scripts working well or even working to any degree. Everything else is designed to support experimenting with such scripts in a working ILS until some such scripts identify record relationships with the needed degree of precision. Sufficiently precise relationships could then populate meta-records for use by the system.)
  • Meta-record design.
    • Bibliographic, holdings, authority, and classification records indexing efficiently.
    • Union catalogue indexing.
    • FRBR relation indexing.
    • FRAR relation indexing.
    • FRSAR relation indexing.
    • FR anything else relation indexing.
    • Multiple MARC record syntax indexing: MARC 21, UNIMARC, MAB2, IBERMARC, etc.
      • (We already have DC, OAI, ONIX, and other universal XML syntax indexing without having the problem of conflicting uses of the same name-space with multiple MARC syntaxes.)
    • Multiple authority file systems support.

Note.

Please use the wiki to post your designs, corrections, reasons why it may not work or may not work efficiently.

Use the login button at the bottom of the document to login. If you have not yet registered a login name for this wiki, use the same login button to register.

Please post quickly with a posting prepared in advance so that the wiki editor is not open too long and the page is not locked preventing other users from adding content.

Please include your name and contact information in posting if you want others to know who it was who posted.

In this wiki you can change anything, so please do so in a constructive manner if needed.

Origin of Development Concept within Koha.

Koha.

Koha is in transition to a series of new major versions with significant design changes to correct many major previous design mistakes and solve all the world's problems. Can an ILS really solve all the world's problems, even if it is free software? Perhaps not but it can come a little closer if you have something to share.

Original Koha Database Design.

The original design from Katipo Communications and Horowhenua Library Trust in New Zealand used an FRBR like model implemented in SQL. The original Koha model was developed independently of FRBR but had a similar inspiration even if the Koha model muddled and flattened proper bibliographic relationships.

Paul Poulain adapted that design to support MARC. Paul added many fantastic innovations to put Koha ahead of other systems in some respects but attempting to use an SQL database store and index bibliographic records in an SQL database was a major limitation on development and performance.

Legacy Koha Indexing Design.

Currently Koha stores bibliographic and holdings information together in MARC bibliographic records. Indexed bibliographic fields are stored according to an FRBR-like hierarchy. Holdings currently uses an adaptation of the French Recommandation 995 local use field for holdings in both UNIMARC and MARC 21.

Koha has some sophisticated support for searching using indexed reference and tracing fields in authority records contributed by Paul Poulain. A new feature has just been added for browsing broader, narrower, and parallel terms in a subject authority records contributed by Henri-Damien Laurent. However, Koha has historically under utilised authority records because there had been no support for matching authority records with the bibliographic records unless the authority records were originally created in the local Koha records database.

Zebra.

Zebra from Index Data is a textual database being used to store records for the current development version of Koha already implemented at some libraries. Zebra is optimised for storing and indexing structured textual records such as MARC and XML. Zebra functions as a Z39.50/SRU server with queries and indexing limited to a large subset of Z39.50/SRU with sophisticated support for the indexing difficulties presented by some languages.

Zebra Indexing Problem for Koha.

Koha needs to store holdings and bibliographic information in separate records to most easily transition from the all holdings in one local use field model of Recommandation 995 to holdings using multiple fields such as standard MARC 21 holdings, SUDOC holdings, the recently created UNIMARC holdings format, IBERMARC holdings, etc. Koha needs to index bibliographic and holdings records together with a common key.

Koha also needs to index authority records and bibliographic records together with a common key.

Zebra has no support for combining different indexes based on a common key either most efficiently at the time the indexes are built or at query time in the inefficient SQL model of joining indexes. Additional development of Zebra to support a more flexible indexing design is outside the capacity of the current Koha community to support at this time. Index Data could develop such a feature if funding were available.

Related records can be retrieved from common keys contained in the records after the result set for a query is parsed to find the keys. If a search is needed to constrain the related records, then subsequent searches are required against every related record. Additional searches against each related record takes much too long for large result sets with a large number of related records.

Workarounds for Needed Indexing.

Koha could continue to use the legacy indexing system while an improved indexing system independent of Zebra was developed. That option would loose much of the advantage of Zebra if Zebra was storing the records without providing the indexing.

Fields which need common indexing could be added to all record types. A possible but undesirable solution. That would require the creation of redundant local use fields with any data needing indexing in common for every record type such that every record would contain every record.

XML Meta-Record Workaround.

Tümer Garip, who has found excellent solutions for much of the Koha implementation of Zebra already, proposed creating meta-records in XML containing both bibliographic and holdings records. The MARC equivalent to this solution has long been used by union catalogues such as MELVYL. Joshua Ferraro extended Tümer's proposal by suggesting adding FRBR model support in the meta-record. Thomas Dukleth extended Tümer's proposal further by suggesting solving all the world's library record management problems with a super meta-record.

Existing Record Relationships Work.

FRBR.

XML SCHEMAS FOR FRBR.

  • XOBIS. The XML Organic Bibliographic Scheme. http://xobis.stanford.edu/ .
    • “XOBIS is an XML schema which reorganizes bibliographic and authority data elements into a single, integrated structure.” That is somewhat like what we are investigating with Koha. “It also attempts to determine a middle path between the complexity of MARC and the oversimplification of the Dublin Core.” We cannot loose the richness of MARC nor its value in a basic functional ILS. XOBIS is still worth studying.

SCRIPTING FRBR RELATIONSHIP MATCHES.

WORK LEVEL MATCH SCRIPTING.

MANIFESTATION LEVEL MATCH SCRIPTING.

All traditional union catalogue record matching work is relevant.

FRAR.

FRSAR.

Database Design.

Record Relationship Discovery Scripting.

Meta-Record Schema Design.

Thomas wrote this and fell asleep before continuing later …

 
en/development/super_meta_record_db.txt · Last modified: 2006/08/10 14:38 by thd
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki