Meta-Record Database Scratchpad.

A scratchpad for database design, record relationship discovery scripting, and XML design for meta-records to index all related records of any type within a single meta-record.

Purpose.

Provide a functional ILS system to serve as a platform for experimenting with discovering, indexing, and using any record relationships. Provide an efficient design for basic ILS functions when most non-explicit record relationships have not yet been discovered or which may never be specified by choice or limitations of relationship discovery.

Design Functions.

Database design,
- Individual record storage database for maintaining records as originally provided by various record sources for onging testing to improve relationship matching.
- Meta-record storage.

Record relationship batch scripts design.
- Batch scripts to analyse record relationships for filling meta-records.
  - (This is the difficult part. Many useful record relationships are not explicit and may be impossible to determine with perfect precision. The system must provide basic functionality without these relationship analysis scripts working well or even working to any degree. Everything else is designed to support experimenting with such scripts in a working ILS until some such scripts identify record relationships with the needed degree of precision. Sufficiently precise relationships could then populate meta-records for use by the system.)

Meta-record design.
- Bibliographic, holdings, authority, and classification records indexing efficiently.
- Union catalogue indexing.
- FRBR relation indexing.
- FRAR relation indexing.
- FRSAR relation indexing.
- FR anything else relation indexing.
- Multiple MARC record syntax indexing: MARC 21, UNIMARC, MAB2, IBERMARC, etc.
  - (We already have DC, OAI, ONIX, and other universal XML syntax indexing without having the problem of conflicting uses of the same name-space with multiple MARC syntaxes.)
- Multiple authority file systems support.

Note.

Please use the wiki to post your designs, corrections, reasons why it may not work or may not work efficiently.

Use the login button at the bottom of the document to login. If you have not yet registered a login name for this wiki, use the same login button to register.

Please post quickly with a posting prepared in advance so that the wiki editor is not open too long and the page is not locked preventing other users from adding content.

Please include your name and contact information in posting if you want others to know who it was who posted.

In this wiki you can change anything, so please do so in a constructive manner if needed.

Origin of Development Concept within Koha.

Koha.

Koha is in transition to a series of new major versions with significant design changes to correct many major previous design mistakes and solve all the world's problems. Can an ILS really solve all the world's problems, even if it is free software? Perhaps not but it can come a little closer if you have something to share.

Original Koha Database Design.

The original design from Katipo Communications and Horowhenua Library Trust in New Zealand used an FRBR like model implemented in SQL. The original Koha model was developed independently of FRBR but had a similar inspiration even if the Koha model muddled and flattened proper bibliographic relationships.

Paul Poulain adapted that design to support MARC. Paul added many fantastic innovations to put Koha ahead of other systems in some respects but attempting to use an SQL database store and index bibliographic records in an SQL database was a major limitation on development and performance.

Legacy Koha Indexing Design.

Currently Koha stores bibliographic and holdings information together in MARC bibliographic records. Indexed bibliographic fields are stored according to an FRBR-like hierarchy. Holdings currently uses an adaptation of the French Recommandation 995 local use field for holdings in both UNIMARC and MARC 21.

Koha has some sophisticated support for searching using indexed reference and tracing fields in authority records contributed by Paul Poulain. A new feature has just been added for browsing broader, narrower, and parallel terms in a subject authority records contributed by Henri-Damien Laurent. However, Koha has historically under utilised authority records because there had been no support for matching authority records with the bibliographic records unless the authority records were originally created in the local Koha records database.

Zebra.

Zebra from Index Data is a textual database being used to store records for the current development version of Koha already implemented at some libraries. Zebra is optimised for storing and indexing structured textual records such as MARC and XML. Zebra functions as a Z39.50/SRU server with queries and indexing limited to a large subset of Z39.50/SRU with sophisticated support for the indexing difficulties presented by some languages.

Zebra Indexing Problem for Koha.

Koha needs to store holdings and bibliographic information in separate records to most easily transition from the all holdings in one local use field model of Recommandation 995 to holdings using multiple fields such as standard MARC 21 holdings, SUDOC holdings, the recently created UNIMARC holdings format, IBERMARC holdings, etc. Koha needs to index bibliographic and holdings records together with a common key.

Koha also needs to index authority records and bibliographic records together with a common key.

Zebra has no support for combining different indexes based on a common key either most efficiently at the time the indexes are built or at query time in the inefficient SQL model of joining indexes. Additional development of Zebra to support a more flexible indexing design is outside the capacity of the current Koha community to support at this time. Index Data could develop such a feature if funding were available.

Related records can be retrieved from common keys contained in the records after the result set for a query is parsed to find the keys. If a search is needed to constrain the related records, then subsequent searches are required against every related record. Additional searches against each related record takes much too long for large result sets with a large number of related records.

Workarounds for Needed Indexing.

Koha could continue to use the legacy indexing system while an improved indexing system independent of Zebra was developed. That option would loose much of the advantage of Zebra if Zebra was storing the records without providing the indexing.

Fields which need common indexing could be added to all record types. A possible but undesirable solution. That would require the creation of redundant local use fields with any data needing indexing in common for every record type such that every record would contain every record.

XML Meta-Record Workaround.

Tümer Garip, who has found excellent solutions for much of the Koha implementation of Zebra already, proposed creating meta-records in XML containing both bibliographic and holdings records. The MARC equivalent to this solution has long been used by union catalogues such as MELVYL. Joshua Ferraro extended Tümer's proposal by suggesting adding FRBR model support in the meta-record. Thomas Dukleth extended Tümer's proposal further by suggesting solving all the world's library record management problems with a super meta-record.

Existing Record Relationships Work.

FRBR.

IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional requirements for bibliographic records : final report. UBCIM publications ; new ser., v. 19. 1998. http://www.ifla.org/VII/s13/frbr/frbr.htm . http://www.ifla.org/VII/s13/frbr/frbr.pdf . The standard IFLA document.
OCLC. OCLC research activities and IFLA's functional requirements for bibliographic records. http://www.oclc.org/research/projects/frbr/ .
- OCLC has significant research into applying FRBR. Membership fees do fund important research. It is unfortunate that more of their excellent research does not find its way out of the secret laboratory. Their efforts at discovering FRBR relationships have not tried well enough to find some of the more difficult to determine relationships from MARC records when such determinations would be necessarily imperfect.
Edward T. O'Neill. Functional requirements for bibliographic records: OCLC's experience identifying and using works. Given at FRBR Workshop, Frankfurt: 8-9 July 2004. Power Point. http://www.oclc.org/research/presentations/oneill/frbrddb2.ppt .
- A simple introduction to FRBR.
Network Development and MARC Standards Office Library of Congress. Functional analysis of the MARC 21 Bibliographic and Holdings Formats. April 6, 2006. http://www.loc.gov/marc/marc-functional-analysis/functional-analysis.html .
- An excellent minute mapping of FRBR relations to MARC 21.
Patrick Le Beouf (ed.) FRBR. : hype or cure-all? In Cataloging & classification quarterly. v. 39 no. 3-4 (2004). http://www.catalogingandclassificationquarterly.com/ccq39nr3-4.html .

XML SCHEMAS FOR FRBR.

XOBIS. The XML Organic Bibliographic Scheme. http://xobis.stanford.edu/ .
- “XOBIS is an XML schema which reorganizes bibliographic and authority data elements into a single, integrated structure.” That is somewhat like what we are investigating with Koha. “It also attempts to determine a middle path between the complexity of MARC and the oversimplification of the Dublin Core.” We cannot loose the richness of MARC nor its value in a basic functional ILS. XOBIS is still worth studying.

SCRIPTING FRBR RELATIONSHIP MATCHES.

Susanna Peruginelli. FRBR : some comments by ELAG (European Library Automation Group). http://www.aib.it/aib/sezioni/toscana/conf/frbr/perug-en.htm . In Associazione Italiana Biblioteche. FRBR (Functional requirements for bibliographic records) seminar. Florence: January 27-28, 2000. http://www.aib.it/aib/sezioni/toscana/conf/cfrbr.htm .
- Contains a consideration of some of the problems for record matching posed by lack of explicit specification of many FRBR relationships in existing bibliographic records. UNIMARC is found to have some advantages in relationship specification over MARC 21.
Thomas B. Hickey. Edward T. O'Neill. Jenny Toves. Experiments with the IFLA functional requirements for bibliographic records (FRBR). In D-Lib Magazine. v. 8, no. 9 (Sept. 2002). http://www.dlib.org/dlib/september02/hickey/09hickey.html .
- A report on OCLC matching experiments. Expression level matching was found to lack precision.
Matching, sorting and display specifications. http://www.loc.gov/marc/marc-functional-analysis/tool.html#table . In Network Development and MARC Standards Office Library of Congress. Functional analysis of the MARC 21 bibliographic and holdings formats : FRBR display tool. Version 2.0. http://www.loc.gov/marc/marc-functional-analysis/tool.html .
- A simple algorithm for work and expression level matching. We can do a little better.

WORK LEVEL MATCH SCRIPTING.

OCLC. FRBR work-set algorithm. http://www.oclc.org/research/software/frbr/default.htm .
- A sophisticated algorithm, however, it is limited to work level matches.

MANIFESTATION LEVEL MATCH SCRIPTING.

All traditional union catalogue record matching work is relevant.

FRAR.

IFLA UBCIM. Working Group on the Functional Requirements and Numbering of Authority Records. (FRANAR). Functional requirements for authority records : a conceptual model ; draft. 2005. http://www.ifla.org/VII/d4/FRANAR-Conceptual-M-Draft-e.pdf .
- The standard IFLA draft document.

FRSAR.

IFLA UBCIM. Working Group on the Functional Requirements for Subject Authority Records. (FRSAR). http://www.ifla.org/VII/s29/wgfrsar.htm .
Tom Delsey. Modeling subject access : extending FRBR and FRAR conceptual models. In Patrick Le Beouf (ed.) FRBR. : hype or cure-all? In Cataloging & classification quarterly. v. 39 no. 3-4 (2004). p. 49-61. http://www.haworthpress.com/store/E-Text/View_EText.asp?a=3&fn=J104v39n03_04&i=3%2F4&s=J104&v=39 .
Tom Delsey. Modeling subject access : refining and extending FRBR and FRAR conceptual models. http://www.kaapeli.fi/~fla/frbr05/delseyModeling%20subject%20access.pdf . In IFLA General Conference Satellite meeting. World Library and Information Congress : bibliotheca universalis - how to organize chaos? 11-12 Aug. 2005. http://www.kaapeli.fi/~fla/frbr05/prgr.htm .
- Nice diagrams with FRBR, FRAR, and FSAR relations.

Database Design.

Record Relationship Discovery Scripting.

Meta-Record Schema Design.

Thomas wrote this and fell asleep before continuing later …