USE THIS SCRIPT AT YOUR OWN RISK
Keep a backup of everything and don't blame me if it goes wrong! Having said that, I have used this script to successfully import 5900+ records into Koha. The system I used was Ubuntu 6.06 LTS with the standard python install. This has not been tested under any other configuration but I'd be glad to hear about any experiences so that I may improve the script if required.
As far as I'm aware, the only thing that you need is python and a broken .MRC file.
When I was testing this script, I took the first MARC record out of our .mrc file and used just this record. It is far easier to check that this works with one record than with 5900+! After the fixing process, view both the original and fixed records side by side. It is fairly easy to see the changes that have been made. This should only be bytes 12-16.
import os,sys # Check for filename if len(sys.argv) != 2: print "%s: <Eclipse MARC file> " % sys.argv[0] sys.exit(-1) # Open the file for reading marcfile=open(sys.argv[1],"r") # Open a file for writing fixedmarcfile=open("%s.fixed" % sys.argv[1],"w") # Initialise variables this_record_start = 0 no_of_records_processed = 0 # Loop over all records while 1: # Read in the record length record_length = marcfile.read(5) # If this is blank, we have run out of data if len(record_length.strip()) == 0: break # Convert to integer record_length = int(record_length) # go back to the start of the record marcfile.seek(this_record_start) # Read in all the data data = marcfile.read(record_length) # Update the this_record_start variable ready for the next record this_record_start += record_length # Skip Leader dic_start = 12 dic_current = dic_start # Loop through the tags while 1: # Check the next byte lookahead = data[dic_current+1] # If it's a digit, read the tag data (12 bytes) if lookahead.isdigit(): dic_current += 12 else: break # Add one for the terminator at the end of the dictionary data_start = dic_current + 1 # Write the data back into the header output_data = data[0:12] + "%05d" % data_start + data[17:] # Write new data to file fixedmarcfile.write(output_data) # Increment the record count no_of_records_processed += 1 # Close the files marcfile.close() fixedmarcfile.close() # Print Report print "Finished processing %d records." % no_of_records_processed