C4::Tags.pm - Support for user tagging of biblios.

C4::Tags.pm - Support for user tagging of biblios.

More verose debugging messages are sent in the presence of non-zero $ENV{"DEBUG"}.

get_count_by_tag_status

  get_count_by_tag_status($status);

Takes a status and gets a count of tags with that status

add_tag(biblionumber,term[,borrowernumber])

External Dictionary (Ispell) [Recommended]

An external dictionary can be used as a means of "pre-populating" and tracking allowed terms based on the widely available Ispell dictionary. This can be the system dictionary or a personal version, but in order to support whitelisting, it must be editable to the process running Koha.

To enable, enter the absolute path to the ispell dictionary in the system preference "TagsExternalDictionary".

Using external Ispell is recommended for both ease of use and performance. Note that any language version of Ispell can be installed. It is also possible to modify the dictionary at the command line to affect the desired content.

WARNING: The default Ispell dictionary includes (properly spelled) obscenities! Users should build their own wordlist and recompile Ispell based on it. See man ispell for instructions.

Table Structure

The tables used by tags are: tags_all tags_index tags_approval tags_blacklist

Your first thought may be that this looks a little complicated. It is, but only because it has to be. I'll try to explain.

tags_all - This table would be all we really need if we didn't care about moderation or performance or tags disappearing when borrowers are removed. Too bad, we do. Otherwise though, it contains all the relevant info about a given tag: tag_id - unique id number for it borrowernumber - user that entered it biblionumber - book record it is attached to term - tag "term" itself language - perhaps used later to influence weighting date_created - date and time it was created

tags_approval - Since we need to provide moderation, this table is used to track it. If no external dictionary is used, this table is the sole reference for approval and rejection. With an external dictionary, it tracks pending terms and past whitelist/blacklist actions. This could be called an "approved terms" table. See above regarding the External Dictionary. term - tag "term" itself approved - Negative, 0 or positive if tag is rejected, pending or approved. date_approved - date of last action approved_by - staffer performing the last action weight_total - total occurrence of term in any biblio by any users

tags_index - This table is for performance, because by far the most common operation will be fetching tags for a list of search results. We will have a set of biblios, and we will want ONLY their approved tags and overall weighting. While we could implement a query that would traverse tags_all filtered against tags_approval, the performance implications of trying to calculate that and the "weight" (number of times a tag appears) on the fly are drastic. term - approved term as it appears in tags_approval biblionumber - book record it is attached to weight - number of times tag applied by any user

tags_blacklist - A set of regular expression filters. Unsurprisingly, these should be perl- compatible (PCRE) for your version of perl. Since this is a blacklist, a term will be blocked if it matches any of the given patterns. WARNING: do not add blacklist regexps if you do not understand their operation and interaction. It is quite easy to define too simple or too complex a regexp and effectively block all terms. The blacklist operation is fairly resource intensive, since every line of tags_blacklist will need to be read and compared. It is recommended that tags_blacklist be used minimally, and only by an administrator with an understanding of regular expression syntax and performance.

So the best way to think about the different tables is that they are each tailored to a certain use. Note that tags_approval and tags_index do not rely on the user's borrower mapping, so the tag population can continue to grow even if a user (along with their corresponding rows in tags_all) is removed.

Tricks

If you want to auto-populate some tags for debugging, do something like this:

mysql> select biblionumber from biblio where title LIKE "%Health%"; +--------------+ | biblionumber | +--------------+ | 18 | | 22 | | 24 | | 30 | | 44 | | 45 | | 46 | | 49 | | 111 | | 113 | | 128 | | 146 | | 155 | | 518 | | 522 | | 524 | | 530 | | 544 | | 545 | | 546 | | 549 | | 611 | | 613 | | 628 | | 646 | | 655 | +--------------+ 26 rows in set (0.00 sec)

Then, take those numbers and type/pipe them into this perl command line: perl -ne 'use C4::Tags qw(get_tags add_tag); use Data::Dumper;chomp; add_tag($_,"health",51,1); print Dumper get_tags({limit=>5,term=>"health",});'