C4::Matcher - find MARC records matching another one
my @matchers = C4::Matcher::GetMatcherList(); my $matcher = C4::Matcher->new($record_type); $matcher->threshold($threshold); $matcher->code($code); $matcher->description($description); $matcher->add_simple_matchpoint('isbn', 1000, '020', 'a', -1, 0, ''); $matcher->add_simple_matchpoint('Date', 1000, '008', '', 7, 4, ''); $matcher->add_matchpoint('isbn', 1000, [ { tag => '020', subfields => 'a', norms => [] } ]); $matcher->add_simple_required_check('245', 'a', -1, 0, '', '245', 'a', -1, 0, ''); $matcher->add_required_check([ { tag => '245', subfields => 'a', norms => [] } ], [ { tag => '245', subfields => 'a', norms => [] } ]); my @matches = $matcher->get_matches($marc_record, $max_matches); foreach $match (@matches) { # matches already sorted in order of # decreasing score print "record ID: $match->{'record_id'}; print "score: $match->{'score'}; } my $matcher_description = $matcher->dump();
my @matchers = C4::Matcher::GetMatcherList();
Returns an array of hashrefs list all matchers present in the database. Each hashref includes:
* matcher_id * code * description
my $matcher = C4::Matcher->new($record_type, $threshold);
Creates a new Matcher. $record_type
indicates which search database to use, e.g., 'biblio' or 'authority' and defaults to 'biblio', while $threshold
is the minimum score required for a match and defaults to 1000.
my $matcher = C4::Matcher->fetch($id);
Creates a matcher object from the version stored in the database. If a matcher with the given id does not exist, returns undef.
my $id = $matcher->store();
Stores matcher in database. The return value is the ID of the marc_matchers row. If the matcher was previously retrieved from the database via the fetch() method, the DB representation of the matcher is replaced.
C4::Matcher->delete($id);
Deletes the matcher of the specified ID from the database.
$matcher->threshold(1000); my $threshold = $matcher->threshold();
Accessor method.
$matcher->_id(123); my $id = $matcher->_id();
Accessor method. Note that using this method to set the DB ID of the matcher should not be done outside of the editing CGI.
$matcher->code('ISBN'); my $code = $matcher->code();
Accessor method.
$matcher->description('match on ISBN'); my $description = $matcher->description();
Accessor method.
$matcher->add_matchpoint($index, $score, $matchcomponents);
Adds a matchpoint that may include multiple components. The $index parameter identifies the index that will be searched, while $score is the weight that will be added if a match is found.
$matchcomponents should be a reference to an array of matchpoint compoents, each of which should be a hash containing the following keys: tag subfields offset length norms
The normalization_rules value should in turn be a reference to an array, each element of which should be a reference to a normalization subroutine (under C4::Normalize) to be applied to the source string.
$matcher->add_simple_matchpoint($index, $score, $source_tag, $source_subfields, $source_offset, $source_length, $source_normalizer);
Adds a simple matchpoint rule -- after composing a key based on the source tag and subfields, normalized per the normalization fuction, search the index. All records retrieved will receive the assigned score.
$match->add_required_check($source_matchpoint, $target_matchpoint);
Adds a required check definition. A required check means that in order for a match to be considered valid, the key derived from the source (incoming) record must match the key derived from the target (already in DB) record.
Unlike a regular matchpoint, only the first repeat of each tag in the source and target match criteria are considered.
A typical example of a required check would be verifying that the titles and publication dates match.
$source_matchpoint and $target_matchpoint are each a reference to an array of hashes, where each hash follows the same definition as the matchpoint component specification in add_matchpoint, i.e.,
tag subfields offset length norms
The normalization_rules value should in turn be a reference to an array, each element of which should be a reference to a normalization subroutine (under C4::Normalize) to be applied to the source string.
$matcher->add_simple_required_check($source_tag, $source_subfields, $source_offset, $source_length, $source_normalizer, $target_tag, $target_subfields, $target_offset, $target_length, $target_normalizer);
Adds a required check, which requires that the normalized keys made from the source and targets must match for a match to be considered valid.
my @matches = $matcher->get_matches($marc_record, $max_matches); foreach $match (@matches) { # matches already sorted in order of # decreasing score print "record ID: $match->{'record_id'}; print "score: $match->{'score'}; }
Identifies all of the records matching the given MARC record. For a record already in the database to be considered a match, it must meet the following criteria:
Only the top $max_matches matches are returned. The returned array is sorted in order of decreasing score, i.e., the best match is first.
$description = $matcher->dump();
Returns a reference to a structure containing all of the information in the matcher object. This is mainly a convenience method to aid setting up a HTML editing form.
Koha Development Team <info@koha.org>
Galen Charlton <galen.charlton@liblime.com>