Installing and Configuring Koha's Zebra Plugin (for Koha 2.2.6)

Installing and Configuring Koha's Zebra Plugin (for Koha 2.2.6)

Joshua Ferraro (jmf AT liblime DOT com)
modified by Paul Poulain (paul@ AT koha-fr DOT org)

Introduction

Koha's Zebra plugin is a new feature with 2.2.6 that allows an otherwise ordinary rel_2_2 Koha to use Zebra for bibliographic data storage, search and retrieval. Why you would want to integrate Koha and Zebra is a topic for another document. This guide assumes you're sold on the idea, and already have some experience managing a Koha system. In it, we'll walk through the process of:

configuring your system
symlinking your installation environment to a 'dev-week' CVS repository
making needed changes to your Koha MySQL database
installing, configuring, and starting Zebra
importing your data

Before following this install document please refer to the “Installing Koha (2.2.6) on Debian Sarge” and the “Updating Koha” documents available from http://kohadocs.org. The assumption is that you've already got Koha 2.2.6 installed and a working knowledge of how to symlink a CVS working repository to your installation. If you don't know what that means, DON'T PROCEED. The Zebra integration adds quite a bit of complexity to the installation and maintenance of Koha, so be warned.

I also highly recommend you read over the Zebra docs at http://indexdata.dk/zebra if you're going to be managing a Zebra installation.

Finally, DO NOT perform these steps on a production system unless you have fully tested them on a test system and are comfortable with the process. Doing otherwise could lead to serious data and configuration loss. And of course, before doing anything, please back up your data.

Preparing the server for Zebra

Base Perl packages

MARC::Record

We have to install the MARC::Record module manually from SourceForge now as the CPAN version of MARC::Record isn't up-to-date.

cvs -z3 -d:pserver:anonymous@marcpm.cvs.sourceforge.net:/cvsroot/marcpm co -P marc-record cvs -z3 -d:pserver:anonymous@marcpm.cvs.sourceforge.net:/cvsroot/marcpm co -P marc-charset cvs -z3 -d:pserver:anonymous@marcpm.cvs.sourceforge.net:/cvsroot/marcpm co -P marc-lint cvs -z3 -d:pserver:anonymous@marcpm.cvs.sourceforge.net:/cvsroot/marcpm co -P marc-xml

cd marc-record perl Makefile.PL make make install

Install Yaz, Zebra, Net::Z3950::ZOOM

on Debian

Put the following in your /etc/apt/sources.list

  # for Yaz Toolkit 
  deb http://ftp.indexdata.dk/debian indexdata/sarge released 
  deb-src http://ftp.indexdata.dk/debian indexdata/sarge released

Now run

  # apt-get update && apt-get install idzebra

(yaz will automatically be installed as it's a dependency)

Install the latest version of Net::Z3950::ZOOM from CPAN:

 # perl -MCPAN -e 'install Net::Z3950::ZOOM'

On other systems

Get latest zebra & yaz sources from : http://www.indexdata.com/yaz/ and http://www.indexdata.com/zebra/ Install Yaz:

 # tar xvfz yaz-version.tar.gz
 # cd yaz-version
 # ./configure
 # make
 # make install

Then istall zebra :

 # tar xvfz idzebra-version.tar.gz
 # cd idzebra-version
 # ./configure
 # make
 # make install

Install the latest version of Net::Z3950::ZOOM from CPAN:

 # perl -MCPAN -e 'install Net::Z3950::ZOOM'

Prepare the filesystem

Check out dev-week from CVS

 # cvs -z3 -d:pserver:anonymous@cvs.savannah.nongnu.org:/sources/koha export -r dev_week koha

 NOTE: This is not a 'check out' but an 'export' The main difference is that there are no CVS directories in the 'export'

Symlink your Koha 2.2.6 install environment to the dev-week 'working copy' (see the 'Updating Koha' document for details)

The zebraplugin directory

In the dev-week Koha cvs repository you'll fine a zebraplugin directory that contains all the files you'll need to set up Zebra.

etc

Within the etc directory, you'll find a koha.xml file that is a replacement for the koha.conf file in rel_2_2. This file is where you specify the location of many of the files in the zebraplugin directory. You'll need to pick a directory structure that works with your configuration and edit the file accordingly. For instance, on my systems, I have a structure like the following:

 /koha
 |-- cvsrepos
 |-- etc
 |-- intranet
 |-- log
 |-- opac
 |-- utils
 `-- zebradb

The default plugin koha.xml uses this directory structure as a point of reference (the etc and zebradb directory above correspond to the same directories in the kohaplugin directory).

zebradb

This directory contains the filesystem that will store all of Zebra's indexes. The only file you should need to edit in the zebradb file structure is the kohalis file within biblios/tab. This file should contain the user/password specified in the koha.xml <zebrauser> directive.

Depending on your system you also may need to modify some idzebra directories. On my Mandriva, zebra parameters are in /usr/local/share/idzebra and not in /usr/local/idzebra. to check it,

 which zebraidx

If the answer is

 /usr/local/bin/zebraidx

then update zebra-biblios.cfg & zebra-authorities.cfg and modify the line

 profilePath:${srcdir:-.}:/usr/share/idzebra/tab/:/koha/zebraplugin/zebradb/biblios/tab/:${srcdir:-.}/tab/

to

 profilePath:${srcdir:-.}:/usr/local/share/idzebra/tab/:/koha/zebraplugin/zebradb/biblios/tab/:${srcdir:-.}/tab/

utils

The utils directory contains the utilities you'll need to perform the rest of the installation / upgrade, which brings us to …

Modify the SQL database

Here are tasks you'll want to perform whether or not this is a brand new Koha install:

updatedatabase (using updatedatabase from rel_2_2)
update to the latest bib framework
convert_to_utf8.pl (from dev-week)

If you're migrating from a previous version of Koha (very likely) you'll need to also do the following:

drop table if exists marc_word
alter table items modify binding tinyint(1) default NULL
alter table systempreferences change value value text
alter table systempreferences change explanation explanation text
import the biblioframework.sql file from zebraplugin/utils
import the phrase_log.sql file from zebraplugin/utils
run missing090field.pl (from dev-week using dev_week C4 )
run biblio_framework.sql from within the mysql monitor (from dev-week using dev_week C4)
run phrase_log.sql from within the mysql monitor (from dev-week using dev_week C4)
export your MARC records using export.pl
run them through a preprocess routine to convert to utf-8 (unless they are already utf-8)
double-check again for missing 090 fields (very critical as indexing will fail if you're missing 090 fields)

NOTE: Don't run rebuildnonmarc ever -- it will ruin your items!!! bug paul about this :-)

Importing Data

If you're upgrading an existing Koha installation, your MySQL database already contains the record data, so all we need to do is import the bibliographic data into Zebra. We can do this thusly:

 # zebraidx -g iso2709 -c /koha/etc/zebra-biblios.cfg -d biblios update /path/to/records
 # zebraidx -g iso2709 -c /koha/etc/zebra-biblios.cfg -d biblios commit
 -g is for group, files with the same group have the same extension
 -c is where the config file is
 -d is the name of the biblioserver

If you need to batch import records that don't exist in your Koha installation, you can use bulkmarcimport as with rel_2_2:

 # cd /path/to/dev_week/repo/
 # export KOHA_CONF=/path/to/koha.xml
 # perl misc/migration_tools/bulkmarcimport /path/to/records.mrc

Starting Zebra

 zebrasrv -f /koha/etc/koha.xml

UTF-8 problems

WARNING : it seems that the following chapter is wrong. kados disagree with this patch & I must admit that it don't solve the problem. The problem seems to come from parser behind MARC::File::XML. henri damien will investigate the problem in the next 2 weeks.

At the time of writing this, there is a bug in MARC:Record package, in utf-8 (unicode) handling.

It can be verified when you search something : all accents in the result list will appear as ?

Tumer & me (Paul) discovered that Usmarc.pm package re-decode utf-8 when we already are in utf-8. To fix this problem :

  cd /usr/lib/perl/site_perl/5.8.?/MARC/File/
  vi USMARC.pm

at line 171, you'll find :

      # if utf8 the we encode the string as utf8
      if ( $marc->encoding() eq 'UTF-8' ) {
          $tagdata = marc_to_utf8( $tagdata );
      }

just comment those lines :

      #        if ( $marc->encoding() eq 'UTF-8' ) {
      #            $tagdata = marc_to_utf8( $tagdata );
      #        }

Yes, it's that simple. :-)