← Index
NYTProf Performance Profile   « block view • line view • sub view »
For conv.pl
  Run on Sun Nov 14 21:27:43 2010
Reported on Sun Nov 14 21:29:11 2010

Filename/usr/share/perl5/MARC/File/XML.pm
StatementsExecuted 125057 statements in 794ms
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
500011284ms30.5sMARC::File::XML::::decodeMARC::File::XML::decode
500011210ms411msMARC::File::XML::::_nextMARC::File::XML::_next
500011117ms117msMARC::File::XML::::CORE:substMARC::File::XML::CORE:subst (opcode)
50001168.7ms87.8msMARC::File::XML::::decideMARC8BinaryMARC::File::XML::decideMARC8Binary
50001163.3ms63.3msMARC::File::XML::::CORE:readlineMARC::File::XML::CORE:readline (opcode)
200004143.1ms43.1msMARC::File::XML::::CORE:matchMARC::File::XML::CORE:match (opcode)
5000118.43ms8.43msMARC::File::XML::::CORE:eofMARC::File::XML::CORE:eof (opcode)
1112.90ms9.52msMARC::File::XML::::BEGIN@7MARC::File::XML::BEGIN@7
1112.68ms2.77msMARC::File::XML::::BEGIN@3MARC::File::XML::BEGIN@3
1111.48ms2.99msMARC::File::XML::::BEGIN@6MARC::File::XML::BEGIN@6
1111.01ms115msMARC::File::XML::::BEGIN@9MARC::File::XML::BEGIN@9
111574µs1.27msMARC::File::XML::::BEGIN@5MARC::File::XML::BEGIN@5
111448µs499µsMARC::File::XML::::BEGIN@4MARC::File::XML::BEGIN@4
11139µs39µsMARC::File::XML::::closeMARC::File::XML::close
11123µs382µsMARC::File::XML::::BEGIN@13MARC::File::XML::BEGIN@13
11118µs95µsMARC::File::XML::::BEGIN@10MARC::File::XML::BEGIN@10
11114µs67µsMARC::File::XML::::BEGIN@14MARC::File::XML::BEGIN@14
11113µs75µsMARC::File::XML::::BEGIN@12MARC::File::XML::BEGIN@12
11113µs52µsMARC::File::XML::::DESTROYMARC::File::XML::DESTROY
11110µs10µsMARC::File::XML::::importMARC::File::XML::import
1118µs8µsMARC::File::XML::::BEGIN@8MARC::File::XML::BEGIN@8
1117µs7µsMARC::File::XML::::BEGIN@15MARC::File::XML::BEGIN@15
1117µs7µsMARC::File::XML::::CORE:qrMARC::File::XML::CORE:qr (opcode)
0000s0sMARC::File::XML::::_unimarc_encodingMARC::File::XML::_unimarc_encoding
0000s0sMARC::File::XML::::default_record_formatMARC::File::XML::default_record_format
0000s0sMARC::File::XML::::encodeMARC::File::XML::encode
0000s0sMARC::File::XML::::escapeMARC::File::XML::escape
0000s0sMARC::File::XML::::footerMARC::File::XML::footer
0000s0sMARC::File::XML::::headerMARC::File::XML::header
0000s0sMARC::File::XML::::outMARC::File::XML::out
0000s0sMARC::File::XML::::recordMARC::File::XML::record
0000s0sMARC::File::XML::::writeMARC::File::XML::write
0000s0sMARC::Record::::as_xml MARC::Record::as_xml
0000s0sMARC::Record::::as_xml_record MARC::Record::as_xml_record
0000s0sMARC::Record::::new_from_xml MARC::Record::new_from_xml
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1package MARC::File::XML;
2
332.49ms22.82ms
# spent 2.77ms (2.68+95µs) within MARC::File::XML::BEGIN@3 which was called: # once (2.68ms+95µs) by main::BEGIN@3 at line 3
use warnings;
# spent 2.77ms making 1 call to MARC::File::XML::BEGIN@3 # spent 44µs making 1 call to warnings::import
43437µs2507µs
# spent 499µs (448+51) within MARC::File::XML::BEGIN@4 which was called: # once (448µs+51µs) by main::BEGIN@3 at line 4
use strict;
# spent 499µs making 1 call to MARC::File::XML::BEGIN@4 # spent 8µs making 1 call to strict::import
53167µs21.34ms
# spent 1.27ms (574µs+692µs) within MARC::File::XML::BEGIN@5 which was called: # once (574µs+692µs) by main::BEGIN@3 at line 5
use vars qw( $VERSION %_load_args );
# spent 1.27ms making 1 call to MARC::File::XML::BEGIN@5 # spent 76µs making 1 call to vars::import
63187µs24.41ms
# spent 2.99ms (1.48+1.51) within MARC::File::XML::BEGIN@6 which was called: # once (1.48ms+1.51ms) by main::BEGIN@3 at line 6
use base qw( MARC::File );
# spent 2.99ms making 1 call to MARC::File::XML::BEGIN@6 # spent 1.42ms making 1 call to base::import
73150µs29.56ms
# spent 9.52ms (2.90+6.62) within MARC::File::XML::BEGIN@7 which was called: # once (2.90ms+6.62ms) by main::BEGIN@3 at line 7
use MARC::Record;
# spent 9.52ms making 1 call to MARC::File::XML::BEGIN@7 # spent 42µs making 1 call to Exporter::import
8341µs18µs
# spent 8µs within MARC::File::XML::BEGIN@8 which was called: # once (8µs+0s) by main::BEGIN@3 at line 8
use MARC::Field;
# spent 8µs making 1 call to MARC::File::XML::BEGIN@8
93175µs1115ms
# spent 115ms (1.01+114) within MARC::File::XML::BEGIN@9 which was called: # once (1.01ms+114ms) by main::BEGIN@3 at line 9
use MARC::File::SAX;
# spent 115ms making 1 call to MARC::File::XML::BEGIN@9
10347µs2172µs
# spent 95µs (18+77) within MARC::File::XML::BEGIN@10 which was called: # once (18µs+77µs) by main::BEGIN@3 at line 10
use XML::SAX qw(Namespaces Validation);
# spent 95µs making 1 call to MARC::File::XML::BEGIN@10 # spent 77µs making 1 call to Exporter::import
11
12339µs2137µs
# spent 75µs (13+62) within MARC::File::XML::BEGIN@12 which was called: # once (13µs+62µs) by main::BEGIN@3 at line 12
use MARC::Charset qw( marc8_to_utf8 utf8_to_marc8 );
# spent 75µs making 1 call to MARC::File::XML::BEGIN@12 # spent 62µs making 1 call to Exporter::import
13355µs2741µs
# spent 382µs (23+359) within MARC::File::XML::BEGIN@13 which was called: # once (23µs+359µs) by main::BEGIN@3 at line 13
use IO::File;
# spent 382µs making 1 call to MARC::File::XML::BEGIN@13 # spent 359µs making 1 call to Exporter::import
14336µs2120µs
# spent 67µs (14+53) within MARC::File::XML::BEGIN@14 which was called: # once (14µs+53µs) by main::BEGIN@3 at line 14
use Carp qw( croak );
# spent 67µs making 1 call to MARC::File::XML::BEGIN@14 # spent 53µs making 1 call to Exporter::import
1532.25ms17µs
# spent 7µs within MARC::File::XML::BEGIN@15 which was called: # once (7µs+0s) by main::BEGIN@3 at line 15
use Encode ();
# spent 7µs making 1 call to MARC::File::XML::BEGIN@15
16
1711µs$VERSION = '0.92';
18
19110µs12.00msmy $factory = XML::SAX::ParserFactory->new();
# spent 2.00ms making 1 call to XML::SAX::ParserFactory::new
2017µs114µs$factory->require_feature(Namespaces);
# spent 14µs making 1 call to XML::SAX::ParserFactory::require_feature
21
22
# spent 10µs within MARC::File::XML::import which was called: # once (10µs+0s) by main::BEGIN@3 at line 3 of conv.pl
sub import {
2311µs my $class = shift;
2413µs %_load_args = @_;
2512µs $_load_args{ DefaultEncoding } ||= 'UTF-8';
2617µs $_load_args{ RecordFormat } ||= 'USMARC';
27}
28
29=head1 NAME
30
- -
101sub default_record_format {
102 my $self = shift;
103 my $format = shift;
104
105 $_load_args{RecordFormat} = $format if ($format);
106
107 return $_load_args{RecordFormat};
108}
109
110
111=head2 as_xml()
112
- -
121sub MARC::Record::as_xml {
122 my $record = shift;
123 my $format = shift || $_load_args{RecordFormat};
124 return( MARC::File::XML::encode( $record, $format ) );
125}
126
127=head2 as_xml_record([$format])
128
- -
137sub MARC::Record::as_xml_record {
138 my $record = shift;
139 my $format = shift || $_load_args{RecordFormat};
140 return( MARC::File::XML::encode( $record, $format, 1 ) );
141}
142
143=head2 new_from_xml([$encoding, $format])
144
- -
157sub MARC::Record::new_from_xml {
158 my $xml = shift;
159 ## to allow calling as MARC::Record::new_from_xml()
160 ## or MARC::Record->new_from_xml()
161 $xml = shift if ( ref($xml) || ($xml eq "MARC::Record") );
162
163 my $enc = shift || $_load_args{BinaryEncoding};
164 my $format = shift || $_load_args{RecordFormat};
165 return( MARC::File::XML::decode( $xml, $enc, $format ) );
166}
167
168=pod
169
- -
184sub out {
185 my ( $class, $filename, $enc ) = @_;
186 my $fh = IO::File->new( ">$filename" ) or croak( $! );
187 $enc ||= $_load_args{DefaultEncoding};
188
189 if ($enc =~ /^utf-?8$/oi) {
190 $fh->binmode(':utf8');
191 } else {
192 $fh->binmode(':raw');
193 }
194
195 my %self = (
196 filename => $filename,
197 fh => $fh,
198 header => 0,
199 encoding => $enc
200 );
201 return( bless \%self, ref( $class ) || $class );
202}
203
204=head2 write()
205
- -
214sub write {
215 my ( $self, $record, $enc ) = @_;
216 if ( ! $self->{ fh } ) {
217 croak( "MARC::File::XML object not open for writing" );
218 }
219 if ( ! $record ) {
220 croak( "must pass write() a MARC::Record object" );
221 }
222 ## print the XML header if we haven't already
223 if ( ! $self->{ header } ) {
224 $enc ||= $self->{ encoding } || $_load_args{DefaultEncoding};
225 $self->{ fh }->print( header( $enc ) );
226 $self->{ header } = 1;
227 }
228 ## print out the record
229 $self->{ fh }->print( record( $record ) ) || croak( $! );
230 return( 1 );
231}
232
233=head2 close()
234
- -
241
# spent 39µs within MARC::File::XML::close which was called: # once (39µs+0s) by MARC::File::XML::DESTROY at line 255
sub close {
24212µs my $self = shift;
24313µs if ( $self->{ fh } ) {
24412µs $self->{ fh }->print( footer() ) if $self->{ header };
245125µs $self->{ fh } = undef;
24612µs $self->{ filename } = undef;
24711µs $self->{ header } = undef;
248 }
24918µs return( 1 );
250}
251
252## makes sure that the XML file is closed off
253
254
# spent 52µs (13+39) within MARC::File::XML::DESTROY which was called: # once (13µs+39µs) by main::NULL at line 0 of conv.pl
sub DESTROY {
255111µs139µs shift->close();
# spent 39µs making 1 call to MARC::File::XML::close
256}
257
258=pod
259
- -
278sub header {
279 my $enc = shift;
280 $enc = shift if ( $enc && (ref($enc) || ($enc eq "MARC::File::XML")) );
281 $enc ||= 'UTF-8';
282 return( <<MARC_XML_HEADER );
283<?xml version="1.0" encoding="$enc"?>
284<collection
285 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
286 xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
287 xmlns="http://www.loc.gov/MARC21/slim">
288MARC_XML_HEADER
289}
290
291=head2 footer()
292
- -
297sub footer {
298 return( "</collection>" );
299}
300
301=head2 record()
302
- -
307sub record {
308 my $record = shift;
309 my $format = shift;
310 my $include_full_record_header = shift;
311 my $enc = shift;
312
313 $format ||= $_load_args{RecordFormat};
314
315 my $_transcode = 0;
316 my $ldr = $record->leader;
317 my $original_encoding = substr($ldr,9,1);
318
319 # Does the record think it is already Unicode?
320 if ($original_encoding ne 'a' && lc($format) !~ /^unimarc/o) {
321 # If not, we'll make it so
322 $_transcode++;
323 substr($ldr,9,1,'a');
324 $record->leader( $ldr );
325 }
326
327 my @xml = ();
328
329 if ($include_full_record_header) {
330 push @xml, <<HEADER
331<?xml version="1.0" encoding="$enc"?>
332<record
333 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
334 xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
335 xmlns="http://www.loc.gov/MARC21/slim">
336HEADER
337
338 } else {
339 push( @xml, "<record>" );
340 }
341
342 push( @xml, " <leader>" . escape( $record->leader ) . "</leader>" );
343
344 foreach my $field ( $record->fields() ) {
345 my ($tag) = escape( $field->tag() );
346 if ( $field->is_control_field() ) {
347 my $data = $field->data;
348 push( @xml, qq( <controlfield tag="$tag">) .
349 escape( ($_transcode ? marc8_to_utf8($data) : $data) ). qq(</controlfield>) );
350 } else {
351 my ($i1) = escape( $field->indicator( 1 ) );
352 my ($i2) = escape( $field->indicator( 2 ) );
353 push( @xml, qq( <datafield tag="$tag" ind1="$i1" ind2="$i2">) );
354 foreach my $subfield ( $field->subfields() ) {
355 my ( $code, $data ) = ( escape( $$subfield[0] ), $$subfield[1] );
356 push( @xml, qq( <subfield code="$code">).
357 escape( ($_transcode ? marc8_to_utf8($data) : $data) ).qq(</subfield>) );
358 }
359 push( @xml, " </datafield>" );
360 }
361 }
362 push( @xml, "</record>\n" );
363
364 if ($_transcode) {
365 substr($ldr,9,1,$original_encoding);
366 $record->leader( $ldr );
367 }
368
369 return( join( "\n", @xml ) );
370}
371
37213µsmy %ESCAPES = (
373 '&' => '&amp;',
374 '<' => '&lt;',
375 '>' => '&gt;',
376);
37737µsmy $ESCAPE_REGEX =
378 eval 'qr/' .
379152µs join( '|', map { $_ = "\Q$_\E" } keys %ESCAPES ) .
# spent 20µs executing statements in string eval
380 '/;'
381 ;
382
383sub escape {
384 my $string = shift;
385 return '' if ! defined $string or $string eq '';
386 $string =~ s/($ESCAPE_REGEX)/$ESCAPES{$1}/oge;
387 return( $string );
388}
389
390
# spent 411ms (210+202) within MARC::File::XML::_next which was called 5000 times, avg 82µs/call: # 5000 times (210ms+202ms) by MARC::File::next at line 104 of MARC/File.pm, avg 82µs/call
sub _next {
39150005.40ms my $self = shift;
39250006.13ms my $fh = $self->{ fh };
393
394 ## return undef at the end of the file
395500038.2ms50008.43ms return if eof($fh);
# spent 8.43ms making 5000 calls to MARC::File::XML::CORE:eof, avg 2µs/call
396
397 ## get a chunk of xml for a record
398500018.5ms local $/ = '</record>';
399500086.4ms500063.3ms my $xml = <$fh>;
# spent 63.3ms making 5000 calls to MARC::File::XML::CORE:readline, avg 13µs/call
400
401 ## trim stuff before the start record element
4025000138ms5000117ms $xml =~ s/.*<record.*?>/<record>/s;
# spent 117ms making 5000 calls to MARC::File::XML::CORE:subst, avg 23µs/call
403
404 ## return undef if there isn't a good chunk of xml
405500031.4ms500012.5ms return if ( $xml !~ m|<record>.*</record>|s );
# spent 12.5ms making 5000 calls to MARC::File::XML::CORE:match, avg 3µs/call
406
407 ## return the chunk of xml
408500093.4ms return( $xml );
409}
410
411=head2 decode()
412
- -
420
# spent 30.5s (284ms+30.2) within MARC::File::XML::decode which was called 5000 times, avg 6.09ms/call: # 5000 times (284ms+30.2s) by MARC::File::next at line 105 of MARC/File.pm, avg 6.09ms/call
sub decode {
42150004.83ms my $text;
42250006.14ms my $location = '';
42350005.72ms my $self = shift;
424
425 ## see MARC::File::USMARC::decode for explanation of what's going on
426 ## here
427500038.8ms500011.4ms if ( ref($self) =~ /^MARC::File/ ) {
# spent 11.4ms making 5000 calls to MARC::File::XML::CORE:match, avg 2µs/call
428500010.1ms $location = 'in record '.$self->{recnum};
429500010.2ms $text = shift;
430 } else {
431 $location = 'in record 1';
432 $text = $self=~/MARC::File/ ? shift : $self;
433 }
434
43550008.70ms my $enc = shift || $_load_args{BinaryEncoding};
43650007.31ms my $format = shift || $_load_args{RecordFormat};
437
438500026.2ms500040.2ms my $handler = MARC::File::SAX->new();
# spent 40.2ms making 5000 calls to MARC::File::SAX::new, avg 8µs/call
439500027.4ms5000696ms my $parser = $factory->parser(
# spent 696ms making 5000 calls to XML::SAX::ParserFactory::parser, avg 139µs/call
440 Handler => $handler,
441 ProtocolEncoding => $_load_args{DefaultEncoding}
442 );
443500028.9ms500087.8ms $parser->{ Handler }{ toMARC8 } = decideMARC8Binary($format,$enc);
# spent 87.8ms making 5000 calls to MARC::File::XML::decideMARC8Binary, avg 18µs/call
444
445500022.9ms500029.3s $parser->parse_string( $text );
# spent 29.3s making 5000 calls to XML::SAX::Base::parse_string, avg 5.86ms/call
446
447500083.3ms500023.7ms return( $handler->record() );
# spent 23.7ms making 5000 calls to MARC::File::SAX::record, avg 5µs/call
448}
449
450
# spent 87.8ms (68.7+19.2) within MARC::File::XML::decideMARC8Binary which was called 5000 times, avg 18µs/call: # 5000 times (68.7ms+19.2ms) by MARC::File::XML::decode at line 443, avg 18µs/call
sub decideMARC8Binary {
45150006.60ms my $format = shift;
45250005.49ms my $enc = shift;
453
454500028.0ms50005.82ms return 0 if (defined($format) && lc($format) =~ /^unimarc/o);
# spent 5.82ms making 5000 calls to MARC::File::XML::CORE:match, avg 1µs/call
455500050.0ms500013.3ms return 0 if (defined($enc) && lc($enc) =~ /^utf-?8/o);
# spent 13.3ms making 5000 calls to MARC::File::XML::CORE:match, avg 3µs/call
456 return 1;
457}
458
459
460=head2 encode()
461
- -
469sub encode {
470 my $record = shift;
471 my $format = shift || $_load_args{RecordFormat};
472 my $without_collection_header = shift;
473 my $enc = shift || $_load_args{DefaultEncoding};
474
475 if (lc($format) =~ /^unimarc/o) {
476 $enc = _unimarc_encoding( $format => $record );
477 }
478
479 my @xml = ();
480 push( @xml, header( $enc ) ) unless ($without_collection_header);
481 # verbose, but naming the header output flags this way to avoid
482 # the potential confusion identified in CPAN bug #34082
483 # http://rt.cpan.org/Public/Bug/Display.html?id=34082
484 my $include_full_record_header = ($without_collection_header) ? 1 : 0;
485 push( @xml, record( $record, $format, $include_full_record_header, $enc ) );
486 push( @xml, footer() ) unless ($without_collection_header);
487
488 return( join( "\n", @xml ) );
489}
490
491sub _unimarc_encoding {
492 my $f = shift;
493 my $r = shift;
494
495 my $pos = 26;
496 $pos = 13 if (lc($f) eq 'unimarcauth');
497
498 my $enc = substr( $r->subfield(100 => 'a'), $pos, 2 );
499
500 if ($enc eq '01' || $enc eq '03') {
501 return 'ISO-8859-1';
502 } elsif ($enc eq '50') {
503 return 'UTF-8';
504 } else {
505 die "Unsupported UNIMARC character encoding [$enc] for XML output for $f; 100\$a -> " . $r->subfield(100 => 'a');
506 }
507}
508
509=head1 TODO
510
- -
543111µs1;
 
# spent 8.43ms within MARC::File::XML::CORE:eof which was called 5000 times, avg 2µs/call: # 5000 times (8.43ms+0s) by MARC::File::XML::_next at line 395, avg 2µs/call
sub MARC::File::XML::CORE:eof; # opcode
# spent 43.1ms within MARC::File::XML::CORE:match which was called 20000 times, avg 2µs/call: # 5000 times (13.3ms+0s) by MARC::File::XML::decideMARC8Binary at line 455, avg 3µs/call # 5000 times (12.5ms+0s) by MARC::File::XML::_next at line 405, avg 3µs/call # 5000 times (11.4ms+0s) by MARC::File::XML::decode at line 427, avg 2µs/call # 5000 times (5.82ms+0s) by MARC::File::XML::decideMARC8Binary at line 454, avg 1µs/call
sub MARC::File::XML::CORE:match; # opcode
# spent 7µs within MARC::File::XML::CORE:qr which was called: # once (7µs+0s) by main::BEGIN@3 at line 1 of (eval 9)[MARC/File/XML.pm:379]
sub MARC::File::XML::CORE:qr; # opcode
# spent 63.3ms within MARC::File::XML::CORE:readline which was called 5000 times, avg 13µs/call: # 5000 times (63.3ms+0s) by MARC::File::XML::_next at line 399, avg 13µs/call
sub MARC::File::XML::CORE:readline; # opcode
# spent 117ms within MARC::File::XML::CORE:subst which was called 5000 times, avg 23µs/call: # 5000 times (117ms+0s) by MARC::File::XML::_next at line 402, avg 23µs/call
sub MARC::File::XML::CORE:subst; # opcode