← Index
NYTProf Performance Profile   « block view • line view • sub view »
For conv.pl
  Run on Sun Nov 14 21:14:18 2010
Reported on Sun Nov 14 21:17:53 2010

Filename/usr/share/perl5/MARC/File/XML.pm
StatementsExecuted 125057 statements in 972ms
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
500011377ms90.3sMARC::File::XML::::decodeMARC::File::XML::decode
500011243ms482msMARC::File::XML::::_nextMARC::File::XML::_next
500011127ms127msMARC::File::XML::::CORE:substMARC::File::XML::CORE:subst (opcode)
50001186.3ms86.3msMARC::File::XML::::CORE:readlineMARC::File::XML::CORE:readline (opcode)
50001175.9ms96.5msMARC::File::XML::::decideMARC8BinaryMARC::File::XML::decideMARC8Binary
200004148.2ms48.2msMARC::File::XML::::CORE:matchMARC::File::XML::CORE:match (opcode)
50001110.8ms10.8msMARC::File::XML::::CORE:eofMARC::File::XML::CORE:eof (opcode)
1112.92ms9.39msMARC::File::XML::::BEGIN@7MARC::File::XML::BEGIN@7
1112.69ms2.79msMARC::File::XML::::BEGIN@3MARC::File::XML::BEGIN@3
1111.44ms2.94msMARC::File::XML::::BEGIN@6MARC::File::XML::BEGIN@6
1111.02ms118msMARC::File::XML::::BEGIN@9MARC::File::XML::BEGIN@9
111563µs1.26msMARC::File::XML::::BEGIN@5MARC::File::XML::BEGIN@5
111461µs509µsMARC::File::XML::::BEGIN@4MARC::File::XML::BEGIN@4
11148µs48µsMARC::File::XML::::closeMARC::File::XML::close
11122µs389µsMARC::File::XML::::BEGIN@13MARC::File::XML::BEGIN@13
11118µs99µsMARC::File::XML::::BEGIN@10MARC::File::XML::BEGIN@10
11117µs68µsMARC::File::XML::::BEGIN@14MARC::File::XML::BEGIN@14
11115µs63µsMARC::File::XML::::DESTROYMARC::File::XML::DESTROY
11113µs76µsMARC::File::XML::::BEGIN@12MARC::File::XML::BEGIN@12
11110µs10µsMARC::File::XML::::importMARC::File::XML::import
1119µs9µsMARC::File::XML::::BEGIN@8MARC::File::XML::BEGIN@8
1118µs8µsMARC::File::XML::::CORE:qrMARC::File::XML::CORE:qr (opcode)
1117µs7µsMARC::File::XML::::BEGIN@15MARC::File::XML::BEGIN@15
0000s0sMARC::File::XML::::_unimarc_encodingMARC::File::XML::_unimarc_encoding
0000s0sMARC::File::XML::::default_record_formatMARC::File::XML::default_record_format
0000s0sMARC::File::XML::::encodeMARC::File::XML::encode
0000s0sMARC::File::XML::::escapeMARC::File::XML::escape
0000s0sMARC::File::XML::::footerMARC::File::XML::footer
0000s0sMARC::File::XML::::headerMARC::File::XML::header
0000s0sMARC::File::XML::::outMARC::File::XML::out
0000s0sMARC::File::XML::::recordMARC::File::XML::record
0000s0sMARC::File::XML::::writeMARC::File::XML::write
0000s0sMARC::Record::::as_xml MARC::Record::as_xml
0000s0sMARC::Record::::as_xml_record MARC::Record::as_xml_record
0000s0sMARC::Record::::new_from_xml MARC::Record::new_from_xml
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1package MARC::File::XML;
2
332.51ms22.83ms
# spent 2.79ms (2.69+95µs) within MARC::File::XML::BEGIN@3 which was called: # once (2.69ms+95µs) by main::BEGIN@3 at line 3
use warnings;
# spent 2.79ms making 1 call to MARC::File::XML::BEGIN@3 # spent 44µs making 1 call to warnings::import
43448µs2515µs
# spent 509µs (461+48) within MARC::File::XML::BEGIN@4 which was called: # once (461µs+48µs) by main::BEGIN@3 at line 4
use strict;
# spent 509µs making 1 call to MARC::File::XML::BEGIN@4 # spent 6µs making 1 call to strict::import
53168µs21.34ms
# spent 1.26ms (563µs+692µs) within MARC::File::XML::BEGIN@5 which was called: # once (563µs+692µs) by main::BEGIN@3 at line 5
use vars qw( $VERSION %_load_args );
# spent 1.26ms making 1 call to MARC::File::XML::BEGIN@5 # spent 83µs making 1 call to vars::import
63186µs24.35ms
# spent 2.94ms (1.44+1.50) within MARC::File::XML::BEGIN@6 which was called: # once (1.44ms+1.50ms) by main::BEGIN@3 at line 6
use base qw( MARC::File );
# spent 2.94ms making 1 call to MARC::File::XML::BEGIN@6 # spent 1.41ms making 1 call to base::import
73158µs29.43ms
# spent 9.39ms (2.92+6.46) within MARC::File::XML::BEGIN@7 which was called: # once (2.92ms+6.46ms) by main::BEGIN@3 at line 7
use MARC::Record;
# spent 9.39ms making 1 call to MARC::File::XML::BEGIN@7 # spent 44µs making 1 call to Exporter::import
8341µs19µs
# spent 9µs within MARC::File::XML::BEGIN@8 which was called: # once (9µs+0s) by main::BEGIN@3 at line 8
use MARC::Field;
# spent 9µs making 1 call to MARC::File::XML::BEGIN@8
93175µs1118ms
# spent 118ms (1.02+117) within MARC::File::XML::BEGIN@9 which was called: # once (1.02ms+117ms) by main::BEGIN@3 at line 9
use MARC::File::SAX;
# spent 118ms making 1 call to MARC::File::XML::BEGIN@9
10348µs2180µs
# spent 99µs (18+81) within MARC::File::XML::BEGIN@10 which was called: # once (18µs+81µs) by main::BEGIN@3 at line 10
use XML::SAX qw(Namespaces Validation);
# spent 99µs making 1 call to MARC::File::XML::BEGIN@10 # spent 81µs making 1 call to Exporter::import
11
12338µs2139µs
# spent 76µs (13+63) within MARC::File::XML::BEGIN@12 which was called: # once (13µs+63µs) by main::BEGIN@3 at line 12
use MARC::Charset qw( marc8_to_utf8 utf8_to_marc8 );
# spent 76µs making 1 call to MARC::File::XML::BEGIN@12 # spent 63µs making 1 call to Exporter::import
13348µs2756µs
# spent 389µs (22+367) within MARC::File::XML::BEGIN@13 which was called: # once (22µs+367µs) by main::BEGIN@3 at line 13
use IO::File;
# spent 389µs making 1 call to MARC::File::XML::BEGIN@13 # spent 367µs making 1 call to Exporter::import
14338µs2119µs
# spent 68µs (17+51) within MARC::File::XML::BEGIN@14 which was called: # once (17µs+51µs) by main::BEGIN@3 at line 14
use Carp qw( croak );
# spent 68µs making 1 call to MARC::File::XML::BEGIN@14 # spent 51µs making 1 call to Exporter::import
1532.25ms17µs
# spent 7µs within MARC::File::XML::BEGIN@15 which was called: # once (7µs+0s) by main::BEGIN@3 at line 15
use Encode ();
# spent 7µs making 1 call to MARC::File::XML::BEGIN@15
16
1712µs$VERSION = '0.92';
18
1919µs11.27msmy $factory = XML::SAX::ParserFactory->new();
# spent 1.27ms making 1 call to XML::SAX::ParserFactory::new
2016µs117µs$factory->require_feature(Namespaces);
# spent 17µs making 1 call to XML::SAX::ParserFactory::require_feature
21
22
# spent 10µs within MARC::File::XML::import which was called: # once (10µs+0s) by main::BEGIN@3 at line 3 of conv.pl
sub import {
2311µs my $class = shift;
2413µs %_load_args = @_;
2512µs $_load_args{ DefaultEncoding } ||= 'UTF-8';
2617µs $_load_args{ RecordFormat } ||= 'USMARC';
27}
28
29=head1 NAME
30
- -
101sub default_record_format {
102 my $self = shift;
103 my $format = shift;
104
105 $_load_args{RecordFormat} = $format if ($format);
106
107 return $_load_args{RecordFormat};
108}
109
110
111=head2 as_xml()
112
- -
121sub MARC::Record::as_xml {
122 my $record = shift;
123 my $format = shift || $_load_args{RecordFormat};
124 return( MARC::File::XML::encode( $record, $format ) );
125}
126
127=head2 as_xml_record([$format])
128
- -
137sub MARC::Record::as_xml_record {
138 my $record = shift;
139 my $format = shift || $_load_args{RecordFormat};
140 return( MARC::File::XML::encode( $record, $format, 1 ) );
141}
142
143=head2 new_from_xml([$encoding, $format])
144
- -
157sub MARC::Record::new_from_xml {
158 my $xml = shift;
159 ## to allow calling as MARC::Record::new_from_xml()
160 ## or MARC::Record->new_from_xml()
161 $xml = shift if ( ref($xml) || ($xml eq "MARC::Record") );
162
163 my $enc = shift || $_load_args{BinaryEncoding};
164 my $format = shift || $_load_args{RecordFormat};
165 return( MARC::File::XML::decode( $xml, $enc, $format ) );
166}
167
168=pod
169
- -
184sub out {
185 my ( $class, $filename, $enc ) = @_;
186 my $fh = IO::File->new( ">$filename" ) or croak( $! );
187 $enc ||= $_load_args{DefaultEncoding};
188
189 if ($enc =~ /^utf-?8$/oi) {
190 $fh->binmode(':utf8');
191 } else {
192 $fh->binmode(':raw');
193 }
194
195 my %self = (
196 filename => $filename,
197 fh => $fh,
198 header => 0,
199 encoding => $enc
200 );
201 return( bless \%self, ref( $class ) || $class );
202}
203
204=head2 write()
205
- -
214sub write {
215 my ( $self, $record, $enc ) = @_;
216 if ( ! $self->{ fh } ) {
217 croak( "MARC::File::XML object not open for writing" );
218 }
219 if ( ! $record ) {
220 croak( "must pass write() a MARC::Record object" );
221 }
222 ## print the XML header if we haven't already
223 if ( ! $self->{ header } ) {
224 $enc ||= $self->{ encoding } || $_load_args{DefaultEncoding};
225 $self->{ fh }->print( header( $enc ) );
226 $self->{ header } = 1;
227 }
228 ## print out the record
229 $self->{ fh }->print( record( $record ) ) || croak( $! );
230 return( 1 );
231}
232
233=head2 close()
234
- -
241
# spent 48µs within MARC::File::XML::close which was called: # once (48µs+0s) by MARC::File::XML::DESTROY at line 255
sub close {
24212µs my $self = shift;
24314µs if ( $self->{ fh } ) {
24412µs $self->{ fh }->print( footer() ) if $self->{ header };
245132µs $self->{ fh } = undef;
24612µs $self->{ filename } = undef;
24712µs $self->{ header } = undef;
248 }
24919µs return( 1 );
250}
251
252## makes sure that the XML file is closed off
253
254
# spent 63µs (15+48) within MARC::File::XML::DESTROY which was called: # once (15µs+48µs) by main::NULL at line 0 of conv.pl
sub DESTROY {
255112µs148µs shift->close();
# spent 48µs making 1 call to MARC::File::XML::close
256}
257
258=pod
259
- -
278sub header {
279 my $enc = shift;
280 $enc = shift if ( $enc && (ref($enc) || ($enc eq "MARC::File::XML")) );
281 $enc ||= 'UTF-8';
282 return( <<MARC_XML_HEADER );
283<?xml version="1.0" encoding="$enc"?>
284<collection
285 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
286 xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
287 xmlns="http://www.loc.gov/MARC21/slim">
288MARC_XML_HEADER
289}
290
291=head2 footer()
292
- -
297sub footer {
298 return( "</collection>" );
299}
300
301=head2 record()
302
- -
307sub record {
308 my $record = shift;
309 my $format = shift;
310 my $include_full_record_header = shift;
311 my $enc = shift;
312
313 $format ||= $_load_args{RecordFormat};
314
315 my $_transcode = 0;
316 my $ldr = $record->leader;
317 my $original_encoding = substr($ldr,9,1);
318
319 # Does the record think it is already Unicode?
320 if ($original_encoding ne 'a' && lc($format) !~ /^unimarc/o) {
321 # If not, we'll make it so
322 $_transcode++;
323 substr($ldr,9,1,'a');
324 $record->leader( $ldr );
325 }
326
327 my @xml = ();
328
329 if ($include_full_record_header) {
330 push @xml, <<HEADER
331<?xml version="1.0" encoding="$enc"?>
332<record
333 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
334 xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
335 xmlns="http://www.loc.gov/MARC21/slim">
336HEADER
337
338 } else {
339 push( @xml, "<record>" );
340 }
341
342 push( @xml, " <leader>" . escape( $record->leader ) . "</leader>" );
343
344 foreach my $field ( $record->fields() ) {
345 my ($tag) = escape( $field->tag() );
346 if ( $field->is_control_field() ) {
347 my $data = $field->data;
348 push( @xml, qq( <controlfield tag="$tag">) .
349 escape( ($_transcode ? marc8_to_utf8($data) : $data) ). qq(</controlfield>) );
350 } else {
351 my ($i1) = escape( $field->indicator( 1 ) );
352 my ($i2) = escape( $field->indicator( 2 ) );
353 push( @xml, qq( <datafield tag="$tag" ind1="$i1" ind2="$i2">) );
354 foreach my $subfield ( $field->subfields() ) {
355 my ( $code, $data ) = ( escape( $$subfield[0] ), $$subfield[1] );
356 push( @xml, qq( <subfield code="$code">).
357 escape( ($_transcode ? marc8_to_utf8($data) : $data) ).qq(</subfield>) );
358 }
359 push( @xml, " </datafield>" );
360 }
361 }
362 push( @xml, "</record>\n" );
363
364 if ($_transcode) {
365 substr($ldr,9,1,$original_encoding);
366 $record->leader( $ldr );
367 }
368
369 return( join( "\n", @xml ) );
370}
371
37213µsmy %ESCAPES = (
373 '&' => '&amp;',
374 '<' => '&lt;',
375 '>' => '&gt;',
376);
37738µsmy $ESCAPE_REGEX =
378 eval 'qr/' .
379158µs join( '|', map { $_ = "\Q$_\E" } keys %ESCAPES ) .
# spent 19µs executing statements in string eval
380 '/;'
381 ;
382
383sub escape {
384 my $string = shift;
385 return '' if ! defined $string or $string eq '';
386 $string =~ s/($ESCAPE_REGEX)/$ESCAPES{$1}/oge;
387 return( $string );
388}
389
390
# spent 482ms (243+238) within MARC::File::XML::_next which was called 5000 times, avg 96µs/call: # 5000 times (243ms+238ms) by MARC::File::next at line 104 of MARC/File.pm, avg 96µs/call
sub _next {
39150005.87ms my $self = shift;
39250007.20ms my $fh = $self->{ fh };
393
394 ## return undef at the end of the file
395500047.5ms500010.8ms return if eof($fh);
# spent 10.8ms making 5000 calls to MARC::File::XML::CORE:eof, avg 2µs/call
396
397 ## get a chunk of xml for a record
398500023.8ms local $/ = '</record>';
3995000114ms500086.3ms my $xml = <$fh>;
# spent 86.3ms making 5000 calls to MARC::File::XML::CORE:readline, avg 17µs/call
400
401 ## trim stuff before the start record element
4025000149ms5000127ms $xml =~ s/.*<record.*?>/<record>/s;
# spent 127ms making 5000 calls to MARC::File::XML::CORE:subst, avg 25µs/call
403
404 ## return undef if there isn't a good chunk of xml
405500034.1ms500014.5ms return if ( $xml !~ m|<record>.*</record>|s );
# spent 14.5ms making 5000 calls to MARC::File::XML::CORE:match, avg 3µs/call
406
407 ## return the chunk of xml
4085000106ms return( $xml );
409}
410
411=head2 decode()
412
- -
420
# spent 90.3s (377ms+89.9) within MARC::File::XML::decode which was called 5000 times, avg 18.1ms/call: # 5000 times (377ms+89.9s) by MARC::File::next at line 105 of MARC/File.pm, avg 18.1ms/call
sub decode {
42150005.00ms my $text;
42250006.61ms my $location = '';
42350005.90ms my $self = shift;
424
425 ## see MARC::File::USMARC::decode for explanation of what's going on
426 ## here
427500041.2ms500013.1ms if ( ref($self) =~ /^MARC::File/ ) {
# spent 13.1ms making 5000 calls to MARC::File::XML::CORE:match, avg 3µs/call
428500012.8ms $location = 'in record '.$self->{recnum};
429500011.3ms $text = shift;
430 } else {
431 $location = 'in record 1';
432 $text = $self=~/MARC::File/ ? shift : $self;
433 }
434
435500010.6ms my $enc = shift || $_load_args{BinaryEncoding};
43650008.54ms my $format = shift || $_load_args{RecordFormat};
437
438500029.6ms500049.4ms my $handler = MARC::File::SAX->new();
# spent 49.4ms making 5000 calls to MARC::File::SAX::new, avg 10µs/call
439500033.3ms5000797ms my $parser = $factory->parser(
# spent 797ms making 5000 calls to XML::SAX::ParserFactory::parser, avg 159µs/call
440 Handler => $handler,
441 ProtocolEncoding => $_load_args{DefaultEncoding}
442 );
443500033.1ms500096.5ms $parser->{ Handler }{ toMARC8 } = decideMARC8Binary($format,$enc);
# spent 96.5ms making 5000 calls to MARC::File::XML::decideMARC8Binary, avg 19µs/call
444
445500023.8ms500088.9s $parser->parse_string( $text );
# spent 88.9s making 5000 calls to XML::SAX::Base::parse_string, avg 17.8ms/call
446
4475000159ms500026.7ms return( $handler->record() );
# spent 26.7ms making 5000 calls to MARC::File::SAX::record, avg 5µs/call
448}
449
450
# spent 96.5ms (75.9+20.5) within MARC::File::XML::decideMARC8Binary which was called 5000 times, avg 19µs/call: # 5000 times (75.9ms+20.5ms) by MARC::File::XML::decode at line 443, avg 19µs/call
sub decideMARC8Binary {
45150007.47ms my $format = shift;
45250006.17ms my $enc = shift;
453
454500030.7ms50005.99ms return 0 if (defined($format) && lc($format) =~ /^unimarc/o);
# spent 5.99ms making 5000 calls to MARC::File::XML::CORE:match, avg 1µs/call
455500053.0ms500014.5ms return 0 if (defined($enc) && lc($enc) =~ /^utf-?8/o);
# spent 14.5ms making 5000 calls to MARC::File::XML::CORE:match, avg 3µs/call
456 return 1;
457}
458
459
460=head2 encode()
461
- -
469sub encode {
470 my $record = shift;
471 my $format = shift || $_load_args{RecordFormat};
472 my $without_collection_header = shift;
473 my $enc = shift || $_load_args{DefaultEncoding};
474
475 if (lc($format) =~ /^unimarc/o) {
476 $enc = _unimarc_encoding( $format => $record );
477 }
478
479 my @xml = ();
480 push( @xml, header( $enc ) ) unless ($without_collection_header);
481 # verbose, but naming the header output flags this way to avoid
482 # the potential confusion identified in CPAN bug #34082
483 # http://rt.cpan.org/Public/Bug/Display.html?id=34082
484 my $include_full_record_header = ($without_collection_header) ? 1 : 0;
485 push( @xml, record( $record, $format, $include_full_record_header, $enc ) );
486 push( @xml, footer() ) unless ($without_collection_header);
487
488 return( join( "\n", @xml ) );
489}
490
491sub _unimarc_encoding {
492 my $f = shift;
493 my $r = shift;
494
495 my $pos = 26;
496 $pos = 13 if (lc($f) eq 'unimarcauth');
497
498 my $enc = substr( $r->subfield(100 => 'a'), $pos, 2 );
499
500 if ($enc eq '01' || $enc eq '03') {
501 return 'ISO-8859-1';
502 } elsif ($enc eq '50') {
503 return 'UTF-8';
504 } else {
505 die "Unsupported UNIMARC character encoding [$enc] for XML output for $f; 100\$a -> " . $r->subfield(100 => 'a');
506 }
507}
508
509=head1 TODO
510
- -
543112µs1;
 
# spent 10.8ms within MARC::File::XML::CORE:eof which was called 5000 times, avg 2µs/call: # 5000 times (10.8ms+0s) by MARC::File::XML::_next at line 395, avg 2µs/call
sub MARC::File::XML::CORE:eof; # opcode
# spent 48.2ms within MARC::File::XML::CORE:match which was called 20000 times, avg 2µs/call: # 5000 times (14.5ms+0s) by MARC::File::XML::_next at line 405, avg 3µs/call # 5000 times (14.5ms+0s) by MARC::File::XML::decideMARC8Binary at line 455, avg 3µs/call # 5000 times (13.1ms+0s) by MARC::File::XML::decode at line 427, avg 3µs/call # 5000 times (5.99ms+0s) by MARC::File::XML::decideMARC8Binary at line 454, avg 1µs/call
sub MARC::File::XML::CORE:match; # opcode
# spent 8µs within MARC::File::XML::CORE:qr which was called: # once (8µs+0s) by main::BEGIN@3 at line 1 of (eval 9)[MARC/File/XML.pm:379]
sub MARC::File::XML::CORE:qr; # opcode
# spent 86.3ms within MARC::File::XML::CORE:readline which was called 5000 times, avg 17µs/call: # 5000 times (86.3ms+0s) by MARC::File::XML::_next at line 399, avg 17µs/call
sub MARC::File::XML::CORE:readline; # opcode
# spent 127ms within MARC::File::XML::CORE:subst which was called 5000 times, avg 25µs/call: # 5000 times (127ms+0s) by MARC::File::XML::_next at line 402, avg 25µs/call
sub MARC::File::XML::CORE:subst; # opcode