In versions of Koha prior to 2.4, the goal with Koha's MARC support was to get a functioning ILS in place that was capable of storing MARC records correctly. But now we have a more ambitious goal: we want our ILS to be capable of searching the semantic information in MARC records to the fullest extent possible. A secondary goal is to provide easy access from the Online Catalog to resources that extend beyond just the bibliographic records for library holdings.
This Wiki page provides a workspace where Koha developers, cataloging staff, and general staff can post ideas, requests, and questions for how Koha handles searching (and display) of bibliographic records and access to other resources.
For those who want to know much more about Koha Zebra configuration, I propose to set KohaZebraConfiguration that will state the Zebra attribute, with an explanation and then the link to whichever field in MARC-21 but also UNIMARC.
There are many considerations in constructing a working definition of the Koha Catalog. Ultimately, our working definition will consist of individual goals. An example of a goal might be “I want to be able to search for an exact title like “It” for Stephen King, and have it be the first record in the result set”
. To realize a given goal, we must define a set of practices in four areas:
Search Indexes
The indexes are where we define:
MARC Frameworks
Koha's MARC Frameworks are where we define:
Cataloging
Consistant cataloging practices are, together with Frameworks and Indexes, an essential component to searching. Here are some things to think about:
Interface Design
The Koha OPAC is an interface through which patrons and staff construct queries of the data. The interface needs to be fast, accurate, and intuitive to use if it is to be a useful search tool of the library's collections.
Our task then, is to construct a working set of expectations and definitions of the above. The definitions can then be applied directly to each of the four categories to realize a given search goal.
008/24-27 review, catalog, encyclopedia, directory.
MARC records don't have a consistant way to distinguish between copyright and publication dates (that I can tell), so we have two date types to think about: copyright/publication, and acquisition. Here are some related MARC fields for each:
First, some (I hope) simple ones:
* Regular Print 007/00-01='ta' * CD Audio 007/00-01='co' * Cassette Recording 007/00-01='sd' * VHS 007/00-01='vf' * DVD 007/00-01='vd' * CD Software 007/00-01='co' * Braille: 007/00-01='fb'
d in 008/23 and 006/06 Form of item (BK MU SE MX) d in 008/29 and 006/12 Form of item (MP VM) d in 008/22 and 006/05 Form of original item (SE) b in 007/01 Specific material designation
008 'd'/23 with the following: BK: Leader/07 (Bibliographic level) 'a' (Monographic component part), 'c' (Collection), 'm' (Monograph/Item) MU: Leader/06 (Type of Record) 'c' (Printed music), 'd' (Manuscript music), 'i' (Non-musical sound recording), or 'j' (Musical sound recording) SE: Leader/07 's' (Serial) or 'b' (Serial component part). MX: Leader/06 'p'
008 - 'd'/29 with the following: MP:Leader/06 (Type of Record) by code e (Printed map) or code f (Manuscript map) VM: Leader/06 'g', 'k', 'o', 'r'
008 - 'd'/22 SE: Leader/07 by code s (Serial) or code b (Serial component part).
007 - 'tb'
So … Something is Large print if:
( ( ((LDR-07='a' or LDR-07='c' or LDR-07='m') and 008-23 = 'd') or ((LDR-06='c' or LDR-06='d' or LDR-06='i' or LDR-06='j') and 008-29='d') or ((LDR-07='s' or LDR-07='b') and 008-23='b') or (LDR-06='p' and 008-23='b') ) or
( ((LDR-06='e' or LDR-06='f') and 008-29='d') or ((LDR-06='g' or LDR-06='k' LDR-06='o' LDR-06='r') and 008-29='d') ) or
( ((LDR-07='s' or LDR-07='b') and 008-22='d') or
007-01='tb')
Index Expected format Notes ----------------------------------------------------- date-entered-on-file [yymmdd] (008/0-5, indexed in word and sort indexes) copydate [yyyy] (260$c, indexed in word and sort indexes) acqdate [yyyy-mm-dd] (952$d, indexed in date,word,sort indexes) pubdate [yyyy] (008/7-10, indexed in year,word,sort indexes) Template Search Parameters Tested: limit-yr (either yyyy or yyyy-yyyy) (added processing for ge le, structure attribute st-numeric, etc.) yr pubdate (yyyy) acqdate,st-date-normalized (yyyy-mm-dd) Template Sort Parameters Tested: pubdate_dsc pubdate_asc acqdate_dsc acqdate_asc
008 / 07-10 : generally a primary date associated with the publication, distribution, etc. of an item and the beginning date of a collection
008 / 11-14 : secondary date associated with the publication distribution, etc. of an item and the ending date of a collection. For books and visual materials, this may be a detailed date which represents a month and day.
260
362
942$k : stored as yyyymmddhhmmss
For the Zebra version of Koha, we're breaking up the itemtypes into four categories:
To do this, we are using a combination of several fields in the record to derive each category.
Leader
LDR/06 type of record
MARC Field: 007/1,2 (form of item)
ta = everything else = 'regular print' tb = LP,LPNF,LP J, LP YA,LP JNF,LP YANF = 'large print' sd = CDM,AB,JAB,JABN,YAB,YABN,ABN, = 'sound disk' co = CDR = 'CD-ROM' vf = AV,AVJ,AVNF,AVJNF = 'VHS' vd = DVD,DVDN,DVDJ,DVJN = 'DVD' ss = JAC,YAC,AC,JACN,YACN,ACN = 'sound cassette'
MARC Field: 008/22 (target audience) a = EASY b = EASY c = J,JNF,JAB,JABN,AVJ,AVJNF,JAC,JACN (juvenile) d = YA,YANF,YAB,YABN,YAC,YACN (young adult) e = everything else (adult) j = J,JNF,JAB,JABN,AVJ,AVJNF,JAC,JACN,DVDJ,DVDJN (juvenile)
MARC Field: 008/33,34
normal records: 008 / 33 fiction/non-fiction 008 / 34 biography (what about mystery ... are they are there any others?)
video recordings: MARC Field 880/33 v = videorecording 008 / 34 l live action 008 / 34 a animation 008 / 34 c animation and live action
sound recordings: 008 / 30-31 a autobiography b biography d drama etc. AUDIO BOOKS LDR nim a 00 008/ 30, 31 Guidelines for applying content designators: Code: Description: # Item is a music sound recording When # is used, it is followed by another blank (##). a Autobiography b Biography c Conference proceedings d Drama e Essays f Fiction Fiction includes novels, short stories, etc. g Reporting Reports of news-worthy events and informative messages are included in this category. h History History includes historical narration, etc., that may also be covered by one of the other codes (e.g., historical poetry). i Instruction Instructional text includes instructions on how to accomplish a task, learn an art, etc. (e.g., how to replace a light switch). Note: Language instruction text is assigned code j. j Language instruction Language instructional text may include passages that fall under the definition for one of the other codes (e.g., language text that includes poetry). k Comedy Spoken comedy. l Lectures, speeches Literary text is lectures and/or speeches. m Memoirs Memoirs are usually autobiographical. n Not applicable Item is not a sound recording (e.g., printed or manuscript music). o Folktales p Poetry r Rehearsals Rehearsals are performances of any of a variety of nonmusical productions. s Sounds Sounds include nonmusical utterances and vocalizations that may or may not convey meaning. t Interviews z Other Type of literary text for which none of the other defined codes are appropriate. | No attempt to code MUSIC LDR njm a 00 008 / 30,31 (usually blank) 008 / 18,19 composition form Guidelines for applying content designators: Code: Description: an Anthems bd Ballads bt Ballets bg Bluegrass music bl Blues cn Canons and rounds i.e., compositions employing strict imitation throughout ct Cantatas cz Canzonas Instrumental music designated as a canzona. cr Carols ca Chaconnes cs Chance compositions cp Chansons, polyphonic cc Chant, Christian cb Chants, Other cl Chorale preludes ch Chorales cg Concerti grossi co Concertos cy Country music df Dance forms Includes music for individual dances except those that have separate codes defined: mazurkas, minuets, pavans, polonaises, and waltzes. dv Divertimentos, serenades, cassations, divertissements, and notturni Instrumental music designated as a divertimento, serenade, cassation, divertissement, or notturno. ft Fantasias Instrumental music designated as fantasia, fancies, fantasies, etc. fm Folk music Includes folk songs, etc. fg Fugues gm Gospel music hy Hymns jz Jazz md Madrigals mr Marches ms Masses mz Mazurkas mi Minuets mo Motets mp Motion picture music mc Musical revues and comedies mu Multiple forms nc Nocturnes nn Not applicable Indicates that form of composition is not applicable to the item. Used for any item that is a non-music sound recording. op Operas or Oratorios ov Overtures pt Part-songs ps Passacaglias Includes all types of ostinato basses. pm Passion music pv Pavans po Polonaises pp Popular music pr Preludes pg Program music rg Ragtime music rp Rhapsodies rq Requiems ri Ricercars rc Rock music rd Rondos sd Square dance music sn Sonatas sg Songs st Studies and exercises Used only when the work is intended for teaching purposes (usually entitled Studies, Etudes, etc.). su Suites sp Symphonic poems sy Symphonies tc Toccatas ts Trio-sonatas uu Unknown Indicates that the form of composition of an item is unknown. Used when the only indication given is the number of instruments and the medium of performance. No structure or genre is given, although they may be implied or understood. vr Variations wz Waltzes zz Other Indicates a form of composition for which none of the other defined codes are appropriate (e.g., villancicos, incidental music, electronic music, etc.). | No attempt to code
This gets tricky. Please keep in mind that I haven't had any formal library science training and the following is what I've gleaned by working with librarians from many different systems. Every library seems to handle these issues differently, but here are some definitions that I hope are universal:
Libraries typically simplify the above elements to simplify record maintenance and searching of materials. For instance, NPL currently uses a simplified scheme that consists of the following:
Name | Use | Composition | Location |
---|---|---|---|
Item Type | general shelving location, circulation rules | locally defined | 942$c |
Call Number | shelf order, subject classification | from Dewey or locally defined | 942$c |
For Koha 2.4, we're proposing to change that scheme slightly to enable better search options in the catalog. Here is the scheme that we're proposing:
Name | Use | Composition | Location |
---|---|---|---|
Classification | subject classification | Dewey | 082 |
Collection Code (itemtype) | circulation rules, general shelving location | locally defined | 942$c |
Call Number | shelf order | Local Call Number (fiction) or Classification (non-fiction) | ? |
Local Call Number | shelf order | NPL's local call number scheme ( <itemtype> <author's last name> ) | 942$c |
Item Call Number | inventory | Call Number | 952? |
Looking forward, we may want to adopt an even more complete scheme such as the following:
Name | Use | Composition | Location |
---|---|---|---|
Classification | subject classification | Dewey | 082 |
Collection Code | circulation rules | locally defined | 942$c |
Shelving Location Code | location of item (new items, general stacks, mysteries and sci-fi, etc.) | locally defined | ? |
Call Number | shelf order | Local Call Number (fiction) or Classification (non-fiction) | ? |
Local Call Number | shelf order | NPL's local call number scheme ( <itemtype> <author's last name> ) | 942$c |
Item Call Number | inventory | Call Number + some other identifier | ? |
Here are some additional thoughts on the topic of Material Organization
Here is a list of requests I know about: