The script below will parse the header of the first record, calculate the dictionary length and compare that to the data stored in the header. NB: It does not test every record in your data. I didn't see a need for this at the time but if anyone finds that random records are not getting imported then I will modify this script to test all of them.
As always, you use this script at your own risk and I will not be held responsible for anything that may or may not go wrong!
import os,sys if len(sys.argv) != 2: print "%s: <Eclipse MARC file> " % sys.argv[0] sys.exit(-1) marcfile=open(sys.argv[1],"r") # Read in the record length record_length = marcfile.read(5) # Convert to integer record_length = int(record_length) # go back to the start of the record marcfile.seek(0) # Read in all the data data = marcfile.read(record_length) marcfile.close() # Skip Leader dic_start = 12 dic_current = dic_start # Loop through the tags while 1: # Check the next byte lookahead = data[dic_current+1] # If it's a digit, read the tag data (12 bytes) if lookahead.isdigit(): dic_current += 12 else: break # Add one for the dictionary terminating character data_start = dic_current + 1 # Compare the calculated value to the value stored in the Leader computed_data_start = "%05d" % data_start data_start_in_leader = data[12:17] if computed_data_start != data_start_in_leader: print "Your eclipse data contains the header error (Found %s, expected %s)." % (data_start_in_leader, computed_data_start) else: print "Your data _appears_ to be ok."