Searching Group Agenda for 2005, June 21

Searching Group Agenda for 2005, June 21

MeetingNotes05june21 Meeting notes for this meeting

Chris's concerns (from his blog):

Currently the search works well, in that its functionally correct and returns correct and complete results. the main points ill be making are

We have 2 audiences, the librarians and the public
Most important, we return correct results, speed comes a close second
The public do not care (for the most part) why/how the results are derived, just as long as they are right
Innovative tools will win us a little fame

I cant stress the correct results enough, its very frustrating to search for an item, see that its on the shelf then find it isnt.

I agree that our goal is 1. accuracy and 2. speed. As I mentioned recently to a French audience, Koha's search results are currently 'correct and complete' for small collections. But when you have 150K records and a search returns an un-sorted list of (say) 5,000 items that match your search that's just not a feasable system. So in this case, NPL actually has a sorting routine for just such a condition (using current MARC setup) however, the overhead for performing it makes searches that large return results in 1-3 minutes. So for larger collections speed and accuracy are closely related. – Joshua

1. Zebra

Zebra is an indexing and search retrieval back end
It indexes many formats including native MARC21 and UNIMARC
The retrieval engine is a fully standards-complaint Z39.50 Server
Z39.50 is not the _latest_ technology for searching … but :
Zebra when combined with yaz-proxy (http://www.indexdata.dk/yazproxy) can convert CQL to RPN queries (Z39.50 default query type)

So our questions are:

is CQL (Common Query Language) the way to go?

http://www.loc.gov/z3950/agency/zing/cql/

should it replace marc tables in Koha?

2. opensearch

So with some additional elements added opensearch could be the ultimate federated searching engine. Mike and I have also found a way to pass CQL queries through opensearch (though that's not in the original spec).

Other References:

If you're new to Koha you may find these references useful in evaluating these topics.

Demos with 150K Records:

Current OPAC: http://search.athenscounty.lib.oh.us

        Plucene demo: http://search.athenscounty.lib.oh.us/cgi-bin/koha/plucene/search.cgi?query=stephenson
        marc_words demo:
        mysqlfulltext demo:
        Zebra: http://liblime.com/zap/advanced.html