In versions of Koha prior to 2.4, the goal with Koha's MARC support was to get a functioning ILS in place that was capable of storing MARC records correctly. But now we have a more ambitious goal: we want our ILS to be capable of searching the semantic information in MARC records to the fullest extent possible. A secondary goal is to provide easy access from the Online Catalog to resources that extend beyond just the bibliographic records for library holdings.
This Wiki page provides a workspace where Koha developers, cataloging staff, and general staff can post ideas, requests, and questions for how Koha handles searching (and display) of bibliographic records and access to other resources.
For those who want to know much more about Koha Zebra configuration, I propose to set KohaZebraConfiguration that will state the Zebra attribute, with an explanation and then the link to whichever field in MARC-21 but also UNIMARC.
There are many considerations in constructing a working definition of the Koha Catalog. Ultimately, our working definition will consist of individual goals. An example of a goal might be “I want to be able to search for an exact title like “It” for Stephen King, and have it be the first record in the result set”. To realize a given goal, we must define a set of practices in four areas:
Search Indexes
The indexes are where we define:
MARC Frameworks
Koha's MARC Frameworks are where we define:
Cataloging
Consistant cataloging practices are, together with Frameworks and Indexes, an essential component to searching. Here are some things to think about:
Interface Design
The Koha OPAC is an interface through which patrons and staff construct queries of the data. The interface needs to be fast, accurate, and intuitive to use if it is to be a useful search tool of the library's collections.
Our task then, is to construct a working set of expectations and definitions of the above. The definitions can then be applied directly to each of the four categories to realize a given search goal.
008/24-27 review, catalog, encyclopedia, directory.
MARC records don't have a consistant way to distinguish between copyright and publication dates (that I can tell), so we have two date types to think about: copyright/publication, and acquisition. Here are some related MARC fields for each:
First, some (I hope) simple ones:
* Regular Print 007/00-01='ta' * CD Audio 007/00-01='co' * Cassette Recording 007/00-01='sd' * VHS 007/00-01='vf' * DVD 007/00-01='vd' * CD Software 007/00-01='co' * Braille: 007/00-01='fb'
d in 008/23 and 006/06 Form of item (BK MU SE MX)
d in 008/29 and 006/12 Form of item (MP VM)
d in 008/22 and 006/05 Form of original item (SE)
b in 007/01 Specific material designation
008 'd'/23 with the following:
BK: Leader/07 (Bibliographic level) 'a' (Monographic component part), 'c' (Collection), 'm' (Monograph/Item)
MU: Leader/06 (Type of Record) 'c' (Printed music), 'd' (Manuscript music), 'i' (Non-musical sound recording), or 'j' (Musical sound recording)
SE: Leader/07 's' (Serial) or 'b' (Serial component part).
MX: Leader/06 'p'
008 - 'd'/29 with the following:
MP:Leader/06 (Type of Record) by code e (Printed map) or code f (Manuscript map)
VM: Leader/06 'g', 'k', 'o', 'r'
008 - 'd'/22
SE: Leader/07 by code s (Serial) or code b (Serial component part).
007 - 'tb'
So … Something is Large print if:
(
( ((LDR-07='a' or LDR-07='c' or LDR-07='m') and 008-23 = 'd') or
((LDR-06='c' or LDR-06='d' or LDR-06='i' or LDR-06='j') and 008-29='d') or
((LDR-07='s' or LDR-07='b') and 008-23='b') or
(LDR-06='p' and 008-23='b') ) or
( ((LDR-06='e' or LDR-06='f') and 008-29='d') or
((LDR-06='g' or LDR-06='k' LDR-06='o' LDR-06='r') and 008-29='d') ) or
( ((LDR-07='s' or LDR-07='b') and 008-22='d') or
007-01='tb')
Index Expected format Notes
-----------------------------------------------------
date-entered-on-file [yymmdd] (008/0-5, indexed in word and sort indexes)
copydate [yyyy] (260$c, indexed in word and sort indexes)
acqdate [yyyy-mm-dd] (952$d, indexed in date,word,sort indexes)
pubdate [yyyy] (008/7-10, indexed in year,word,sort indexes)
Template Search Parameters Tested:
limit-yr (either yyyy or yyyy-yyyy) (added processing for ge le, structure attribute st-numeric, etc.)
yr pubdate (yyyy)
acqdate,st-date-normalized (yyyy-mm-dd)
Template Sort Parameters Tested:
pubdate_dsc
pubdate_asc
acqdate_dsc
acqdate_asc
008 / 07-10 : generally a primary date associated with the publication, distribution, etc. of an item and the beginning date of a collection
008 / 11-14 : secondary date associated with the publication distribution, etc. of an item and the ending date of a collection. For books and visual materials, this may be a detailed date which represents a month and day.
260
362
942$k : stored as yyyymmddhhmmss
For the Zebra version of Koha, we're breaking up the itemtypes into four categories:
To do this, we are using a combination of several fields in the record to derive each category.
Leader
LDR/06 type of record
MARC Field: 007/1,2 (form of item)
ta = everything else = 'regular print' tb = LP,LPNF,LP J, LP YA,LP JNF,LP YANF = 'large print' sd = CDM,AB,JAB,JABN,YAB,YABN,ABN, = 'sound disk' co = CDR = 'CD-ROM' vf = AV,AVJ,AVNF,AVJNF = 'VHS' vd = DVD,DVDN,DVDJ,DVJN = 'DVD' ss = JAC,YAC,AC,JACN,YACN,ACN = 'sound cassette'
MARC Field: 008/22 (target audience) a = EASY b = EASY c = J,JNF,JAB,JABN,AVJ,AVJNF,JAC,JACN (juvenile) d = YA,YANF,YAB,YABN,YAC,YACN (young adult) e = everything else (adult) j = J,JNF,JAB,JABN,AVJ,AVJNF,JAC,JACN,DVDJ,DVDJN (juvenile)
MARC Field: 008/33,34
normal records: 008 / 33 fiction/non-fiction 008 / 34 biography (what about mystery ... are they are there any others?)
video recordings: MARC Field 880/33 v = videorecording 008 / 34 l live action 008 / 34 a animation 008 / 34 c animation and live action
sound recordings:
008 / 30-31 a autobiography
b biography
d drama
etc.
AUDIO BOOKS
LDR nim a 00
008/ 30, 31
Guidelines for applying content designators:
Code: Description:
# Item is a music sound recording When # is used, it is followed by
another blank (##).
a Autobiography
b Biography
c Conference proceedings
d Drama
e Essays
f Fiction Fiction includes novels, short stories, etc.
g Reporting Reports of news-worthy events and informative messages
are included in this category.
h History History includes historical narration, etc., that may also
be covered by one of the other codes (e.g., historical poetry).
i Instruction Instructional text includes instructions on how to
accomplish a task, learn an art, etc. (e.g., how to replace a light
switch). Note: Language instruction text is assigned code j.
j Language instruction Language instructional text may include
passages that fall under the definition for one of the other codes
(e.g., language text that includes poetry).
k Comedy Spoken comedy.
l Lectures, speeches Literary text is lectures and/or speeches.
m Memoirs Memoirs are usually autobiographical.
n Not applicable Item is not a sound recording (e.g., printed or
manuscript music).
o Folktales
p Poetry
r Rehearsals Rehearsals are performances of any of a variety of
nonmusical productions.
s Sounds Sounds include nonmusical utterances and vocalizations that
may or may not convey meaning.
t Interviews
z Other Type of literary text for which none of the other defined
codes are appropriate.
| No attempt to code
MUSIC
LDR njm a 00
008 / 30,31 (usually blank)
008 / 18,19 composition form
Guidelines for applying content designators:
Code: Description:
an Anthems
bd Ballads
bt Ballets
bg Bluegrass music
bl Blues
cn Canons and rounds i.e., compositions employing strict imitation
throughout
ct Cantatas
cz Canzonas Instrumental music designated as a canzona.
cr Carols
ca Chaconnes
cs Chance compositions
cp Chansons, polyphonic
cc Chant, Christian
cb Chants, Other
cl Chorale preludes
ch Chorales
cg Concerti grossi
co Concertos
cy Country music
df Dance forms Includes music for individual dances except those that
have separate codes defined: mazurkas, minuets, pavans, polonaises,
and waltzes.
dv Divertimentos, serenades, cassations, divertissements, and notturni
Instrumental music designated as a divertimento, serenade, cassation,
divertissement, or notturno.
ft Fantasias Instrumental music designated as fantasia, fancies,
fantasies, etc.
fm Folk music Includes folk songs, etc.
fg Fugues
gm Gospel music
hy Hymns
jz Jazz
md Madrigals
mr Marches
ms Masses
mz Mazurkas
mi Minuets
mo Motets
mp Motion picture music
mc Musical revues and comedies
mu Multiple forms
nc Nocturnes
nn Not applicable Indicates that form of composition is not applicable
to the item. Used for any item that is a non-music sound recording.
op Operas
or Oratorios
ov Overtures
pt Part-songs
ps Passacaglias Includes all types of ostinato basses.
pm Passion music
pv Pavans
po Polonaises
pp Popular music
pr Preludes
pg Program music
rg Ragtime music
rp Rhapsodies
rq Requiems
ri Ricercars
rc Rock music
rd Rondos
sd Square dance music
sn Sonatas
sg Songs
st Studies and exercises Used only when the work is intended for
teaching purposes (usually entitled Studies, Etudes, etc.).
su Suites
sp Symphonic poems
sy Symphonies
tc Toccatas
ts Trio-sonatas
uu Unknown Indicates that the form of composition of an item is
unknown. Used when the only indication given is the number of
instruments and the medium of performance. No structure or genre is
given, although they may be implied or understood.
vr Variations
wz Waltzes
zz Other Indicates a form of composition for which none of the other
defined codes are appropriate (e.g., villancicos, incidental music,
electronic music, etc.).
| No attempt to code
This gets tricky. Please keep in mind that I haven't had any formal library science training and the following is what I've gleaned by working with librarians from many different systems. Every library seems to handle these issues differently, but here are some definitions that I hope are universal:
Libraries typically simplify the above elements to simplify record maintenance and searching of materials. For instance, NPL currently uses a simplified scheme that consists of the following:
| Name | Use | Composition | Location |
|---|---|---|---|
| Item Type | general shelving location, circulation rules | locally defined | 942$c |
| Call Number | shelf order, subject classification | from Dewey or locally defined | 942$c |
For Koha 2.4, we're proposing to change that scheme slightly to enable better search options in the catalog. Here is the scheme that we're proposing:
| Name | Use | Composition | Location |
|---|---|---|---|
| Classification | subject classification | Dewey | 082 |
| Collection Code (itemtype) | circulation rules, general shelving location | locally defined | 942$c |
| Call Number | shelf order | Local Call Number (fiction) or Classification (non-fiction) | ? |
| Local Call Number | shelf order | NPL's local call number scheme ( <itemtype> <author's last name> ) | 942$c |
| Item Call Number | inventory | Call Number | 952? |
Looking forward, we may want to adopt an even more complete scheme such as the following:
| Name | Use | Composition | Location |
|---|---|---|---|
| Classification | subject classification | Dewey | 082 |
| Collection Code | circulation rules | locally defined | 942$c |
| Shelving Location Code | location of item (new items, general stacks, mysteries and sci-fi, etc.) | locally defined | ? |
| Call Number | shelf order | Local Call Number (fiction) or Classification (non-fiction) | ? |
| Local Call Number | shelf order | NPL's local call number scheme ( <itemtype> <author's last name> ) | 942$c |
| Item Call Number | inventory | Call Number + some other identifier | ? |
Here are some additional thoughts on the topic of Material Organization
Here is a list of requests I know about: