Testing your data exported from Eclipse for the MARC header error

The script below will parse the header of the first record, calculate the dictionary length and compare that to the data stored in the header. NB: It does not test every record in your data. I didn't see a need for this at the time but if anyone finds that random records are not getting imported then I will modify this script to test all of them.

As always, you use this script at your own risk and I will not be held responsible for anything that may or may not go wrong!

The Script

 
import os,sys
 
if len(sys.argv) != 2:
        print "%s: <Eclipse MARC file> " % sys.argv[0]
        sys.exit(-1)
 
marcfile=open(sys.argv[1],"r")
 
# Read in the record length
record_length = marcfile.read(5)
 
# Convert to integer
record_length = int(record_length) 
 
# go back to the start of the record
marcfile.seek(0)
 
# Read in all the data
data = marcfile.read(record_length)
marcfile.close()
 
# Skip Leader
dic_start = 12
dic_current = dic_start
 
# Loop through the tags
while 1:
        # Check the next byte
        lookahead = data[dic_current+1]
 
        # If it's a digit, read the tag data (12 bytes)
        if lookahead.isdigit():
                dic_current += 12
        else:
                break
 
# Add one for the dictionary terminating character
data_start = dic_current + 1
 
# Compare the calculated value to the value stored in the Leader
computed_data_start = "%05d" % data_start 
data_start_in_leader = data[12:17]
 
if computed_data_start != data_start_in_leader:
        print "Your eclipse data contains the header error (Found %s, expected %s)." % (data_start_in_leader, computed_data_start)
else:
        print "Your data _appears_ to be ok."