Script to fix MARC21 files exported by Eclipse

Disclaimer

USE THIS SCRIPT AT YOUR OWN RISK

Keep a backup of everything and don't blame me if it goes wrong! :-) Having said that, I have used this script to successfully import 5900+ records into Koha. The system I used was Ubuntu 6.06 LTS with the standard python install. This has not been tested under any other configuration but I'd be glad to hear about any experiences so that I may improve the script if required.

Requirements

As far as I'm aware, the only thing that you need is python and a broken .MRC file.

Tips

When I was testing this script, I took the first MARC record out of our .mrc file and used just this record. It is far easier to check that this works with one record than with 5900+! After the fixing process, view both the original and fixed records side by side. It is fairly easy to see the changes that have been made. This should only be bytes 12-16.

The Script

import os,sys
 
# Check for filename 
if len(sys.argv) != 2:
	print "%s: <Eclipse MARC file> " % sys.argv[0]
	sys.exit(-1)
 
# Open the file for reading
marcfile=open(sys.argv[1],"r")
 
# Open a file for writing
fixedmarcfile=open("%s.fixed" % sys.argv[1],"w")
 
# Initialise variables
this_record_start = 0
no_of_records_processed = 0
 
# Loop over all records
while 1:
	# Read in the record length
	record_length = marcfile.read(5)
 
	# If this is blank, we have run out of data
	if len(record_length.strip()) == 0:
		break
 
	# Convert to integer
	record_length = int(record_length) 
 
	# go back to the start of the record
	marcfile.seek(this_record_start)
 
	# Read in all the data
	data = marcfile.read(record_length)
 
	# Update the this_record_start variable ready for the next record
	this_record_start += record_length
 
	# Skip Leader
	dic_start = 12
	dic_current = dic_start
 
	# Loop through the tags
	while 1:
		# Check the next byte
		lookahead = data[dic_current+1]
 
		# If it's a digit, read the tag data (12 bytes)
		if lookahead.isdigit():
	                dic_current += 12
		else:
			break
 
	# Add one for the terminator at the end of the dictionary
	data_start = dic_current + 1
 
	# Write the data back into the header
	output_data = data[0:12] + "%05d" % data_start + data[17:]
 
	# Write new data to file
	fixedmarcfile.write(output_data)
 
	# Increment the record count
	no_of_records_processed += 1
 
# Close the files
marcfile.close()
fixedmarcfile.close()
 
# Print Report
print "Finished processing %d records." % no_of_records_processed
 
eclipse_fix_marc_python_script.txt · Last modified: 2006/11/08 11:15 by lea
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki