The Koha Online Catalog: A Working Definition

In versions of Koha prior to 2.4, the goal with Koha's MARC support was to get a functioning ILS in place that was capable of storing MARC records correctly. But now we have a more ambitious goal: we want our ILS to be capable of searching the semantic information in MARC records to the fullest extent possible. A secondary goal is to provide easy access from the Online Catalog to resources that extend beyond just the bibliographic records for library holdings.

This Wiki page provides a workspace where Koha developers, cataloging staff, and general staff can post ideas, requests, and questions for how Koha handles searching (and display) of bibliographic records and access to other resources.

For those who want to know much more about Koha Zebra configuration, I propose to set KohaZebraConfiguration that will state the Zebra attribute, with an explanation and then the link to whichever field in MARC-21 but also UNIMARC.

Scope

There are many considerations in constructing a working definition of the Koha Catalog. Ultimately, our working definition will consist of individual goals. An example of a goal might be “I want to be able to search for an exact title like “It” for Stephen King, and have it be the first record in the result set”. To realize a given goal, we must define a set of practices in four areas:

Search Indexes

The indexes are where we define:

  • how MARC fields should be grouped together as 'search points' (eg, 'author', 'date', 'exact title' are search points)
  • what kinds of searches we can do on those grouping (eg, 'number' search, 'phrase' search)
  • how to search within certain fields for data (specific positions of fixed fields for instance)

MARC Frameworks

Koha's MARC Frameworks are where we define:

  • what constitutes a MARC record (what fields/subfields)
  • labels for each field
  • how the fields are handled within the MARC editor
  • how the fields should be displayed in search results and details pages
  • a mapping between MARC records and Koha's item management (issues, reserves, circ rules, barcodes, etc.)

Cataloging

Consistant cataloging practices are, together with Frameworks and Indexes, an essential component to searching. Here are some things to think about:

  • NPL employs 'copy-cataloging', not original cataloging, so records often come from different sources that may have different cataloging practices.
  • in areas where no official rule has been made in AACR2 or similar cataloging manuals, Koha will need a consistant practice in order to properly index records
  • with over 2000 edit points per record, we need to identify clearly which of those are most important for purposes of search and display

Interface Design

The Koha OPAC is an interface through which patrons and staff construct queries of the data. The interface needs to be fast, accurate, and intuitive to use if it is to be a useful search tool of the library's collections.


Our task then, is to construct a working set of expectations and definitions of the above. The definitions can then be applied directly to each of the four categories to realize a given search goal.

Discussion Points

Contents?

008/24-27 review, catalog, encyclopedia, directory.

Formats?

MARC records don't have a consistant way to distinguish between copyright and publication dates (that I can tell), so we have two date types to think about: copyright/publication, and acquisition. Here are some related MARC fields for each:

Simple Cases

First, some (I hope) simple ones:

* Regular Print 007/00-01='ta' * CD Audio 007/00-01='co' * Cassette Recording 007/00-01='sd' * VHS 007/00-01='vf' * DVD 007/00-01='vd' * CD Software 007/00-01='co' * Braille: 007/00-01='fb'

Large print:
      d in 008/23 and 006/06 Form of item (BK MU SE MX)
      d in 008/29 and 006/12 Form of item (MP VM)
      d in 008/22 and 006/05 Form of original item (SE)
      b in 007/01 Specific material designation
      008 'd'/23 with the following:                
              BK: Leader/07 (Bibliographic level) 'a' (Monographic component part), 'c' (Collection), 'm' (Monograph/Item)                
              MU: Leader/06 (Type of Record) 'c' (Printed music), 'd' (Manuscript music), 'i' (Non-musical sound recording), or 'j' (Musical sound recording)
              SE: Leader/07 's' (Serial) or 'b' (Serial component part).
              MX: Leader/06 'p'
      008 - 'd'/29 with the following:
              MP:Leader/06 (Type of Record) by code e (Printed map) or code f (Manuscript map)
              VM: Leader/06 'g', 'k', 'o', 'r'
      008 - 'd'/22
              SE: Leader/07 by code s (Serial) or code b (Serial component part).
      007 - 'tb'

So … Something is Large print if:

      (
      ( ((LDR-07='a' or LDR-07='c' or LDR-07='m') and 008-23 = 'd') or
      ((LDR-06='c' or LDR-06='d' or LDR-06='i' or LDR-06='j') and 008-29='d') or
      ((LDR-07='s' or LDR-07='b') and 008-23='b') or
      (LDR-06='p' and 008-23='b') ) or
      ( ((LDR-06='e' or LDR-06='f') and 008-29='d') or
      ((LDR-06='g' or LDR-06='k' LDR-06='o' LDR-06='r') and 008-29='d') ) or
      ( ((LDR-07='s' or LDR-07='b') and 008-22='d') or
      007-01='tb')

Dates

Summary of Koha 3.0 date indexing for MARC21

 Index                   Expected format         Notes
 -----------------------------------------------------
 date-entered-on-file    [yymmdd]                (008/0-5, indexed in word and sort indexes)
 copydate                [yyyy]                  (260$c, indexed in word and sort indexes)
 acqdate                 [yyyy-mm-dd]            (952$d, indexed in date,word,sort indexes)
 pubdate                 [yyyy]                  (008/7-10, indexed in year,word,sort indexes)
 
 Template Search Parameters Tested:
      limit-yr (either yyyy or yyyy-yyyy) (added processing for ge le, structure attribute st-numeric, etc.)
      yr pubdate (yyyy) 
      acqdate,st-date-normalized (yyyy-mm-dd)
 
 Template Sort Parameters Tested:
      pubdate_dsc
      pubdate_asc
      acqdate_dsc
      acqdate_asc

Notes

copyright/publication dates
   008 / 07-10 : generally a primary date associated with the 
   publication, distribution, etc. of an item and the beginning 
   date of a collection
   008 / 11-14 : secondary date associated with the publication
   distribution, etc. of an item and the ending date of a collection.
   For books and visual materials, this may be a detailed date which
   represents a month and day.
   260
   362
  • Index I propose to index the 008/07-10 field and make that the date field used for date searches
  • MARC Framework The framework should require that 008/07-10 be filled with values
  • Cataloging We need to make sure that all our records have values in the 008/07-10
  • Interface Design What ways do we want to be able to search on dates? in a range, individually?
acquisition date
   942$k : stored as yyyymmddhhmmss

Item Types, Circulation Rules, etc.

For the Zebra version of Koha, we're breaking up the itemtypes into four categories:

  1. collection code (the original itemtype)
  2. audience
  3. content
  4. format

To do this, we are using a combination of several fields in the record to derive each category.

Leader

 LDR/06  type of record

FORMAT OF ITEM

 MARC Field: 007/1,2 (form of item)
 ta = everything else = 'regular print'
 tb = LP,LPNF,LP J, LP YA,LP JNF,LP YANF = 'large print'
 sd = CDM,AB,JAB,JABN,YAB,YABN,ABN, = 'sound disk'
 co = CDR = 'CD-ROM'
 vf = AV,AVJ,AVNF,AVJNF = 'VHS'
 vd = DVD,DVDN,DVDJ,DVJN = 'DVD'
 ss = JAC,YAC,AC,JACN,YACN,ACN = 'sound cassette'

TARGET AUDIENCE

 MARC Field: 008/22 (target audience)
 a = EASY
 b = EASY
 c = J,JNF,JAB,JABN,AVJ,AVJNF,JAC,JACN (juvenile)
 d = YA,YANF,YAB,YABN,YAC,YACN (young adult)
 e = everything else (adult)
 j = J,JNF,JAB,JABN,AVJ,AVJNF,JAC,JACN,DVDJ,DVDJN (juvenile)

CONTENT

 MARC Field: 008/33,34
 normal records:
 008 / 33 fiction/non-fiction
 008 / 34 biography
 (what about mystery ... are they are there any others?)
 video recordings: MARC Field 880/33
 v = videorecording
 
 008 / 34 l live action
 008 / 34 a animation
 008 / 34 c animation and live action
 sound recordings:
 008 / 30-31 a autobiography
       b biography
       d drama
       etc.
 AUDIO BOOKS
 LDR nim a  00
 008/  30, 31
 Guidelines for applying content designators:
 
 Code:   Description:
 #       Item is a music sound recording  When # is used, it is followed by
 another blank (##).
 a       Autobiography
 b       Biography
 c       Conference proceedings
 d       Drama
 e       Essays
 f       Fiction  Fiction includes novels, short stories, etc.
 g       Reporting  Reports of news-worthy events and informative messages
 are included in this category.
 h       History  History includes historical narration, etc., that may also
 be covered by one of the other codes (e.g., historical poetry).
 i       Instruction  Instructional text includes instructions on how to
 accomplish a task, learn an art, etc. (e.g., how to replace a light
 switch).  Note:  Language instruction text is assigned code j.
 j       Language instruction  Language instructional text may include
 passages that fall under the definition for one of the other codes
 (e.g., language text that includes poetry).
 k       Comedy  Spoken comedy.
 l       Lectures, speeches  Literary text is lectures and/or speeches.
 m       Memoirs  Memoirs are usually autobiographical.
 n       Not applicable  Item is not a sound recording (e.g., printed or
 manuscript music).
 o       Folktales
 p       Poetry
 r       Rehearsals  Rehearsals are performances of any of a variety of
 nonmusical productions.
 s       Sounds  Sounds include nonmusical utterances and vocalizations that
 may or may not convey meaning.
 t       Interviews
 z       Other  Type of literary text for which none of the other defined
 codes are appropriate.
 |       No attempt to code
 
 MUSIC
 LDR njm a  00
 008 / 30,31  (usually blank)
 008 / 18,19  composition form
 
 Guidelines for applying content designators:
 
 Code:   Description:
 an      Anthems
 bd      Ballads
 bt      Ballets
 bg      Bluegrass music
 bl      Blues
 cn      Canons and rounds  i.e., compositions employing strict imitation
 throughout
 ct      Cantatas
 cz      Canzonas  Instrumental music designated as a canzona.
 cr      Carols
 ca      Chaconnes
 cs      Chance compositions
 cp      Chansons, polyphonic
 cc      Chant, Christian
 cb      Chants, Other
 cl      Chorale preludes
 ch      Chorales
 cg      Concerti grossi
 co      Concertos
 cy      Country music
 df      Dance forms  Includes music for individual dances except those that
 have separate codes defined:  mazurkas, minuets, pavans, polonaises,
 and waltzes.
 dv      Divertimentos, serenades, cassations, divertissements, and notturni
 Instrumental music designated as a divertimento, serenade, cassation,
 divertissement, or  notturno.
 ft      Fantasias  Instrumental music designated as fantasia, fancies,
 fantasies, etc.
 fm      Folk music  Includes folk songs, etc.
 fg      Fugues
 gm      Gospel music
 hy      Hymns
 jz      Jazz
 md      Madrigals
 mr      Marches
 ms      Masses
 mz      Mazurkas
 mi      Minuets
 mo      Motets
 mp      Motion picture music
 mc      Musical revues and comedies
 mu      Multiple forms
 nc      Nocturnes
 nn      Not applicable  Indicates that form of composition is not applicable
 to the item.  Used for any item that is a non-music sound recording.
 op      Operas
 or      Oratorios
 ov      Overtures
 pt      Part-songs
 ps      Passacaglias  Includes all types of ostinato basses.
 pm      Passion music
 pv      Pavans
 po      Polonaises
 pp      Popular music
 pr      Preludes
 pg      Program music
 rg      Ragtime music
 rp      Rhapsodies
 rq      Requiems
 ri      Ricercars
 rc      Rock music
 rd      Rondos
 sd      Square dance music
 sn      Sonatas
 sg      Songs
 st      Studies and exercises  Used only when the work is intended for
 teaching purposes (usually entitled Studies, Etudes, etc.).
 su      Suites
 sp      Symphonic poems
 sy      Symphonies
 tc      Toccatas
 ts      Trio-sonatas
 uu      Unknown  Indicates that the form of composition of an item is
 unknown.  Used when the only indication given is the number of
 instruments and the medium of performance.  No structure or genre is
 given, although they may be implied or understood.
 vr      Variations
 wz      Waltzes
 zz      Other  Indicates a form of composition for which none of the other
 defined codes are appropriate (e.g., villancicos, incidental music,
 electronic music, etc.).
 |       No attempt to code
  • Index I propose that the above guidelines be used for indexing a record for its itemtype, format, audience, and content
  • MARC Framework The framework should require that the above fields be filled with values
  • Cataloging We need to make sure that all our records have appropriate values in the above fields
  • Interface Design need to make sure the interface is easy to use

Organization of Materials

This gets tricky. Please keep in mind that I haven't had any formal library science training and the following is what I've gleaned by working with librarians from many different systems. Every library seems to handle these issues differently, but here are some definitions that I hope are universal:

  • Collection Code - used to specify circulation rules on a given record or item
  • Classification - a taxonomy for organizing a library collection into subjects
  • Shelving Location - the general location of an item within the library (general stacks, reference area, new books shelf, science fiction area, etc.)
  • Call Number - a standards-based scheme for organization of a given item on the shelf. Typically, the call number is composed of some part of the classification
  • Local Call Number - a locally-defined scheme for organizing items on the shelf.
  • Item Call Number - an item-specific call number, sometimes used to distinguish between two of the same item on the same shelf. Also used for inventory as a way to specify which shelf a given item is associated with.

Libraries typically simplify the above elements to simplify record maintenance and searching of materials. For instance, NPL currently uses a simplified scheme that consists of the following:

Name Use Composition Location
Item Type general shelving location, circulation rules locally defined 942$c
Call Number shelf order, subject classification from Dewey or locally defined 942$c

For Koha 2.4, we're proposing to change that scheme slightly to enable better search options in the catalog. Here is the scheme that we're proposing:

Name Use Composition Location
Classification subject classification Dewey 082
Collection Code (itemtype) circulation rules, general shelving location locally defined 942$c
Call Number shelf order Local Call Number (fiction) or Classification (non-fiction) ?
Local Call Number shelf order NPL's local call number scheme ( <itemtype> <author's last name> ) 942$c
Item Call Number inventory Call Number 952?

Looking forward, we may want to adopt an even more complete scheme such as the following:

Name Use Composition Location
Classification subject classification Dewey 082
Collection Code circulation rules locally defined 942$c
Shelving Location Code location of item (new items, general stacks, mysteries and sci-fi, etc.) locally defined ?
Call Number shelf order Local Call Number (fiction) or Classification (non-fiction) ?
Local Call Number shelf order NPL's local call number scheme ( <itemtype> <author's last name> ) 942$c
Item Call Number inventory Call Number + some other identifier ?

Here are some additional thoughts on the topic of Material Organization

  • There is currently crossover between itemtypes and call numbers but I think we can safely ignore it
  • Staff need search and sort by 'Call Number'. A 'Call Number Search' is defined as:
    • search Classification
    • if not found, search 'Local Call Number'
    • sort of this search point is based on which type of 'call number' the search was on
  • Sorting by call numbers outside of the context of a call number search will consist of sorting by number first, then by text
  • Item Call Numbers are required for inventory
  • NPL does not use shelving locations

Display of Records

Here is a list of requests I know about:

  • Volume Numbers (245$n) should be included in title display and search
  • Subjects should display in a semantically correct way
 
zebrasearchingdefinitions.txt · Last modified: 2007/12/17 10:03 by kados
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki