<<

NAME

C4::Matcher - find MARC records matching another one

SYNOPSIS

  my @matchers = C4::Matcher::GetMatcherList();

  my $matcher = C4::Matcher->new($record_type);
  $matcher->threshold($threshold);
  $matcher->code($code);
  $matcher->description($description);

  $matcher->add_simple_matchpoint('isbn', 1000, '020', 'a', -1, 0, '');
  $matcher->add_simple_matchpoint('Date', 1000, '008', '', 7, 4, '');
  $matcher->add_matchpoint('isbn', 1000, [ { tag => '020', subfields => 'a', norms => [] } ]);

  $matcher->add_simple_required_check('245', 'a', -1, 0, '', '245', 'a', -1, 0, '');
  $matcher->add_required_check([ { tag => '245', subfields => 'a', norms => [] } ], 
                               [ { tag => '245', subfields => 'a', norms => [] } ]);

  my @matches = $matcher->get_matches($marc_record, $max_matches);

  foreach $match (@matches) {

      # matches already sorted in order of
      # decreasing score
      print "record ID: $match->{'record_id'};
      print "score:     $match->{'score'};

  }

  my $matcher_description = $matcher->dump();

FUNCTIONS

GetMatcherList

  my @matchers = C4::Matcher::GetMatcherList();

Returns an array of hashrefs list all matchers present in the database. Each hashref includes:

 * matcher_id
 * code
 * description

GetMatcherId

  my $matcher_id = C4::Matcher::GetMatcherId($code);

Returns the matcher_id of a code.

METHODS

new

  my $matcher = C4::Matcher->new($record_type, $threshold);

Creates a new Matcher. $record_type indicates which search database to use, e.g., 'biblio' or 'authority' and defaults to 'biblio', while $threshold is the minimum score required for a match and defaults to 1000.

fetch

  my $matcher = C4::Matcher->fetch($id);

Creates a matcher object from the version stored in the database. If a matcher with the given id does not exist, returns undef.

store

  my $id = $matcher->store();

Stores matcher in database. The return value is the ID of the marc_matchers row. If the matcher was previously retrieved from the database via the fetch() method, the DB representation of the matcher is replaced.

delete

  C4::Matcher->delete($id);

Deletes the matcher of the specified ID from the database.

threshold

  $matcher->threshold(1000);
  my $threshold = $matcher->threshold();

Accessor method.

_id

  $matcher->_id(123);
  my $id = $matcher->_id();

Accessor method. Note that using this method to set the DB ID of the matcher should not be done outside of the editing CGI.

code

  $matcher->code('ISBN');
  my $code = $matcher->code();

Accessor method.

description

  $matcher->description('match on ISBN');
  my $description = $matcher->description();

Accessor method.

add_matchpoint

  $matcher->add_matchpoint($index, $score, $matchcomponents);

Adds a matchpoint that may include multiple components. The $index parameter identifies the index that will be searched, while $score is the weight that will be added if a match is found.

$matchcomponents should be a reference to an array of matchpoint compoents, each of which should be a hash containing the following keys: tag subfields offset length norms

The normalization_rules value should in turn be a reference to an array, each element of which should be a reference to a normalization subroutine (under C4::Normalize) to be applied to the source string.

add_simple_matchpoint

  $matcher->add_simple_matchpoint($index, $score, $source_tag,
                            $source_subfields, $source_offset, 
                            $source_length, $source_normalizer);

Adds a simple matchpoint rule -- after composing a key based on the source tag and subfields, normalized per the normalization fuction, search the index. All records retrieved will receive the assigned score.

add_required_check

  $match->add_required_check($source_matchpoint, $target_matchpoint);

Adds a required check definition. A required check means that in order for a match to be considered valid, the key derived from the source (incoming) record must match the key derived from the target (already in DB) record.

Unlike a regular matchpoint, only the first repeat of each tag in the source and target match criteria are considered.

A typical example of a required check would be verifying that the titles and publication dates match.

$source_matchpoint and $target_matchpoint are each a reference to an array of hashes, where each hash follows the same definition as the matchpoint component specification in add_matchpoint, i.e.,

    tag
    subfields
    offset
    length
    norms

The normalization_rules value should in turn be a reference to an array, each element of which should be a reference to a normalization subroutine (under C4::Normalize) to be applied to the source string.

add_simple_required_check

  $matcher->add_simple_required_check($source_tag, $source_subfields,
                $source_offset, $source_length, $source_normalizer, 
                $target_tag, $target_subfields, $target_offset, 
                $target_length, $target_normalizer);

Adds a required check, which requires that the normalized keys made from the source and targets must match for a match to be considered valid.

find_matches

  my @matches = $matcher->get_matches($marc_record, $max_matches);
  foreach $match (@matches) {
      # matches already sorted in order of
      # decreasing score
      print "record ID: $match->{'record_id'};
      print "score:     $match->{'score'};
  }

Identifies all of the records matching the given MARC record. For a record already in the database to be considered a match, it must meet the following criteria:

1. Total score from its matching field must exceed the supplied threshold.
2. It must pass all required checks.

Only the top $max_matches matches are returned. The returned array is sorted in order of decreasing score, i.e., the best match is first.

dump

  $description = $matcher->dump();

Returns a reference to a structure containing all of the information in the matcher object. This is mainly a convenience method to aid setting up a HTML editing form.

AUTHOR

Koha Development Team <http://koha-community.org/>

Galen Charlton <galen.charlton@liblime.com>

<<