LORLS

Twenty years of LORLS

It doesn’t seem like five minutes ago since we were celebrating the 15th anniversary of the launch of LORLS and now here we are now at its 20th. Unfortunately the current lockdown prevents us from celebrating its birthday in the usual manner (i.e. with cake).

LORLS was initially conceived of in 1999 in response to an enquiry to the Library from the University’s Learning & Teaching Committee. The system was written using the open source LAMP development stack and launched the following year. Since then it has been used by a dozen other institutions, survived six major revisions, three different library management systems and seen the rise and fall of numerous other reading list management systems.

So what does the future hold for LORLS, well the sad truth is that all of the staff involved in its development have either moved on to take on new responsibilities or left the institution. And finally after 20 years there are now commercial offerings that at least meet, if not exceed, the capabilities of our little in-house system. So whilst LORLS is not yet dead, it is more than likely it will be taking a very well deserved retirement at some point in the coming years.

A Newcomer’s guide to installing “LORLS In A Box”

As has been mentioned in this blog before, back in the days when the MALS team was still the Library Systems team, they developed the Loughborough Online Reading List System (LORLS) to manage the resources for directed student reading.

A recent rebranding of the University’s Library online presence means we now have a requirement to change the styling of our local installation of LORLS. I am still relatively new to the team and as yet have not had cause to look at the front end of LORLS, known affectionately as CLUMP, and this seemed like an ideal introduction for me.

So where better to start than with the documentation the team already put together. My first port of call is the installation instructions where I discover that some thoughtful techie has already built me a VM to play with. Reading through the guide to the VM you can tell the techie in question was Jon, the passwords used are a good clue but the giveaway is the advice to make a cup of tea and eat a biscuit whilst waiting for the download.

The download and subsequent import into Virtual Box seem to happen with a minimum of fuss but here is where I make my first rookie mistake as I choose to reinitialise the MAC address of all network interfaces.

CentOS maintains a mapping of MAC address to interface IDs and so it spots that the VM no longer has the MAC address it associated with eth0 but does have an entirely new MAC address which it associates with eth1. The configuration of the VMs NIC is tied to eth0 and so I don’t have a working network connection.

This is quick and easy to fix. First I head to /etc/sysconfig/network-scripts where I rename the file ifcfg-eth0 to ifcfg-eth1 (this is not essential but helps with sanity). I then edit this file and change the DEVICE value to be eth1 and update the HWADDR value to be the new MAC address which can be found using ifconfig -a. Restart the network service and all is well.

Of course if you don’t reinitialise the MAC address when importing the VM then you shouldn’t see this issue and it should just work straight away.

When starting the VM, as described in Jon’s instructions, displayed above the login prompt I am shown the IP address assigned to the VM by DHCP and the URL for my LORLS instance. Plugging these into my browser takes me to a vanilla installation of LORLS running on my VM. One note here, be sure to type CLUMP and not clump it is case sensitive.

So all pretty straightforward to get up and running, in fact I am pleased I made the mistake with the MAC address as there would have been little of note to write about otherwise. Now onto setup and customisation but I may save that for another “Newcomer’s guide to LORLS” blog post in the future.

LORLS Implementation at DBS

I note with great interest that Dublin Business School has recently had an article accepted in the New Review of Academic Librarianship, regarding their faculties’ perceptions of LORLS.

In the article Marie O’Neill and Lara Musto discuss a survey of faculty staff at DBS which reveals that their awareness of the system is greatly impacted by the amount of time they spend teaching. They also show that promoting appropriate resources to students and improving communication between faculty and library staff are seen as major advantages of having a RLMS. I particularly liked the following quote that came out of one of their focus groups:

“One of the challenges nowadays is recognising that students are reading more than books and articles. They are reading the review section of IMDb for example. Reading lists have to change and our perception of reading lists.”

One of the other outputs of the article was a process implementation chart, which was created to inform other institutions how they might best implement LORLS. This chart is reproduced below with the kind permission of the authors.

The article concludes with a strong desire from DBS faculty for greater integration with their Moodle VLE system. This is something that we are actively investigating and we have begun to pilot a Moodle plug-in at Loughborough, which we hope to include in a future release of the LORLS software.

Word document importer

At this year’s Meeting the Reading List Challenge (MTRLC) workshop, my boss Gary Brewerton demonstrated one of the features we have in LORLS: the ability to ingest a Word document that contains Harvard(ish) citations. Our script reads in a Office Open XML.docx format Word document and spits out some structured data ready to import into a LORLS reading list.  The idea behind this is that academics still create reading lists in Word, despite us having had an online system for 15 years now. Anything we can do to make getting these Word documents into LORLS easier for them, the more likely it is that we’ll actually get to see the data. We’ve had this feature for a while now, and its one of those bits of code that we revisit every so often when we come across new Word documents that it doesn’t handle as well as we’d like.

The folk at MTRLC seemed to like it, and Gary suggested that I yank the core of the import code out of LORLS, bash it around a bit and then make it available as a standalone program for people to play with, including sites that don’t use LORLS.  So that’s what I’ve done – you can download the single script from:

https://lorls.lboro.ac.uk/WordImporter/WordImporter

The code is, as with the rest of LORLS, written in Perl. It makes heavy use of regular expression pattern matching and Z39.50 look ups to do its work.  It is intended to run as a CGI script, so you’ll need to drop it on a machine with a web server.  It also uses some Perl modules from CPAN that you’ll need to make sure are installed:

  • Data::Dumper
  • Algorithm::Diff
  • Archive::Any
  • XML::Simple
  • JSON
  • IO::File
  • ZOOM
  • CGI

The code has been developed and run under Linux (specifically Debian Jessie and then CentOS 6) with the Apache web server.  It doesn’t do anything terribly exciting with CGI though, so it should probably run OK on other platforms as long as you have  working Perl interpreter and the above modules installed. As distributed its looks at the public Bodleian Library Z39.50 server in Oxford, but you’ll probably want to point it at your own library system’s Z39.50 server (the variable names are pretty self-explanatory in the code!).

This script gives a couple of options for output.  The first is RIS format, which is an citation interchange format that quite a few systems accept.  It also has the option of JSON output if you want to suck the data back into your own code.  If you opt for JSON format you can also include a callback function name so that you can use JSONP (and thus make use of this script in Javascript code running in web browsers).

We hope you find this script useful.  And if you do feel up to tweaking and improving it, we’d love to get patches and fixes back!

The continuing battle of parsing Word documents for reading list material

At this year’s Meeting The Reading List Challenge (MTLRC) workshop, Gary Brewerton (my boss) showed the delegates one of our LORLS features: the ability to suck citation data out of Word .docx documents.  We’ve had this for a few years and it is intended to allow academics to take existing reading lists that they have produced in Word and import them relatively easily into our electronic reading lists system. The nice front end was written by my colleague Jason Cooper, but I was responsible for the underlying guts that the APIs call to try parsing the Word document and turn it into structured data that LORLS can understand and use. We wrote it originally based on suggestions from a few academics who already had reading lists with Harvard style references in them, and they used it to quickly populate LORLS with their data.

Shortly after the MTRLC workshop, Gary met with some other academics who also needed to import existing reading lists into LORLS.  He showed them our existing importer and, whilst it worked, it left quite alot entries as “notes”, meaning it couldn’t parse them into structured data.  Gary then asked me to take another look at the backend code and see if I could improve its recognition rate.

I had a set of “test” lists donated by the academics of varying lengths, all from the same department.  With the existing code, in some cases less than 50% of the items in these documents were recognised and classified correctly.  Of those, some were misclassified (eg book chapters appearing as books).

The existing LORLS .docx import code used Perl regular expression pattern matching alone to try to work out what sort of work a citation referred to this.  This worked OK with Word documents where the citations were well formed.  A brief glance through the new lists showed that lots of the citations were not well formed.  Indeed the citation style and layout seemed to vary from item to item, probably because they had been collected over a period of years by a variety of academics.  Whilst I could modify some of the pattern matches to help recognise some of the more obvious cases, it was clear that the code was going to need something extra.

That extra turned out to be Z39.50 look ups.  We realised that the initial pattern matches could be quite generalised to see if we could extract out authors, titles, publishers and dates, and then use those to do a Z39.50 look up.  Lots of the citations also had classmarks attached, so we’d got a good clue that many of the works did exist in the library catalogue.  This initial pattern match was still done using regular expressions and needed quite a lot of tweaking to get recognition accuracy up.  For example spotting publishers separated from titles can be “interesting”, especially if the title isn’t delimited properly.  We can spot some common cases, such as publishers located in London, New York or in US states with two letter abbreviations.  It isn’t fool proof, but its better than nothing.

However using this left us with only around 5% of the entries in the documents classified as unstructured notes when visual checking indicated that they were probably citations. These remaining items are currently left as notes, as there are a number of reasons why the code can’t parse them.

The number one reason is that they don’t have enough punctuation and/or formatting in the citation to allow regular expression to determine which parts are which. In some cases the layout doesn’t even really bear much relation to any formal Harvard style – the order of authors and titles can some time switch round and in some cases it isn’t clear where the title finishes and the publisher ends. In a way they’re a good example to students of the sort of thing they shouldn’t have in their own referencing!

The same problems encountered using the regular expressions would happen with a formal parser as these entries are effectively “syntax errors” if treating Harvard citations as a sort of grammar.  This is probably about the best we’ll be able to do for a while, at least until we’ve got some deep AI that can actually read the text and understand it, rather that just scan for patterns.

And if we reach that point LORLS will probably become self aware… now there’s a scary thought!

Happy birthday LORLS!

Fifteen years ago a reading list system (initially unnamed) was launched at Loughborough University. The system allowed staff to input reading lists item-by-item using a HTML form and display these web-based lists to our students with links to the library catalogue to check stock availability.

Since that time the system has undergone six major revisions. These revisions have extended the functionality to allow “drag-and-drop” reordering of lists, display of stock availability and book covers, importing of citations from Word documents and so much more! Also the system gained a name, LORLS.

So today we’re celebrating fifteen years of LORLS!

IMAG0098_BURST002_COVER

We’ve even had birthday greetings and cards from our friends at PTFS Europe and Talis.

 ptfs

talis-2

Thanks guys!

Extending the Structural Unit Types available

One additional Structural Unit Type (SUT) that we have been asked for is Audio Visual (AV) material, e.g. CDs, DVDs, Film, etc.  While we’ve manually added an AV SUT to our local instance we didn’t have an easy way to extend this to other instances of LORLS.

So to tackle this we have put together a quick Perl script that can be run from the command line which adds in the new AV SUT. If your LORLS install doesn’t have an AV Material SUT and you would like to add it then here are the instructions to do so:

  1. Back up your LORLS install (Don’t forget the database as this will be altered)
  2. Download the latest extendSUTs script (e.g. wget “https://blog.lboro.ac.uk/lorls/wp-content/uploads/sites/3/2014/11/extendSUTs”)
  3. Make the script executable (e.g. chmod +x extendSUTs)
  4. Run the script (e.g. ./extendSUTs –database=<database> –user=<database user>)
  5. When prompted enter the database user’s password
  6. If the script fails due to missing the Term::ReadKey Perl module then install it and try the script again (RedHat/CentOS should just need to run “sudo yum install perl-TermReadKey”)
  7. Once the script has run open a new browser session and try adding a new AV Material entry to a test list.

 

Enhancing the look and feel of tables

Recently we have looked at improving some of the tables in CLUMP.  At first we thought that it would involve quite a bit of work, but then we came across the DataTables jQuery plugin.  After a couple of days of coding we’ve used it to enhance a number of tables on our development version of CLUMP.  Key features that attracted us to it are:

Licencing

DataTables is made available under an MIT Licence, so it is very developer friendly.

Easy to apply to existing tables

As one of the many data sources that DataTables supports is the DOM you can use it to quickly enhance an existing table.

Pagination

Sometimes a long table can be unwieldy, with the pagination options in DataTables you can specify how many entries to show by default and how the next/previous page options should be presented to the user.  Of Course you can disable the pagination to display the data in one full table.

Sortable columns

One of the most useful features DataTables has is its ability to allow the user to order the table by any column by simply clicking on that column’s header, an action that has become second nature to a lot of users.  It’s also possible to provide custom sorting functions for columns if the standard sorting options don’t work for the data they contain.

Instant search

As users type their search terms into the search box DataTables hides table rows that don’t meet the current search criteria.

Extensible

There are a number of extensions available for DataTables that enhance its features, from allowing users to reorder the column by dragging their headers about, to adding the option for users to export the table to the clipboard or exporting it as a CSV, XLS or even a PDF.

Very Extensible

If you have some bespoke functionality required then you can use its plug-in Architecture to create your own plugin to meet it.

A LORLS virtual machine image

We’ve been thinking recently about how to make it easier for people to try out the LORLS code. This includes ourselves – we sometimes want to spin up a new instance of LORLS for testing some feature or helping another site debug their installation. Normally that would mean doing an operating system, Perl module and then LORLS installation on a new machine (physical or virtual) before it could be used.

With the spread of virtual machine (VM) infrastructure, and the fact that many Universities now use VMs widely, we thought it might be worth making a “LORLS in a box” VM appliance image that people could grab and then use for testing, demos or as the basis of their own installation. The VM image would have LORLS pre-installed along with all the basic Perl modules required in place, and the test data that we use in our local sandbox instance.

To that end, here’s a first cut of a LORLS in a box VM image. This is an OVA virtual machine image (both the VM and the disc image).  Be warned that its quite large (over 900MB!) as it has a full operating system disc image included – you might want to have a cup of tea and biscuit handy whilst it downloads. The image was built using the Virtual Box OSE VM platform and is based on a CentOS 6 Linux base, and it should be able to be imported into other VM infrastructures such as VMware.

Once you’ve imported the LORLs-in-a-box VM into your VM infrastructure and started it up, you’ll eventually be presented with a login on the console.  Your VM system may complain about not having a matching ethernet controller the first time you run the VM image – you can ignore this error as the LORLS VM image should work round it when it boots up.  Once booted, the console should also show you the IP address that the VM has picked up and the URL that you can use to get to the CLUMP web front end.  By default the networking in the VM is using a bridged interface that picks up an IPv4 address via DHCP.  This is fine for testing and development, though if you’re using this image as the basis of a production system you’ll probably want to nip in and change this to a static IP address.

To login to the system there’s a “lorls” user with the password “lorls4you” (both without the quotes).  This user can then act as the superuser by using sudo.  The MySQL server on the machine has a root user password of “LUMPyStuff!” (again quoteless) should you wish to go in and tinker with the database directly.  You probably want to change all these passwords (and Linux root password) as soon as you can as everyone now knows them!  You’ll also most probably want to edit /usr/local/LUMP/LUMP.pm file to point at your own site’s Z39.50 server, etc.  There are a couple of demo LORLS web users hard coded into this demo system – user aker (password “demic”) is an academic that owns a reading list, and user “libby” (password “rarian”) is a library staff user.

Koha loan history script

We’ve been looking at intergrating the Koha LMS with the LORLS reading list management system, as we know of sites using Koha that are interested in LORLS (and also because we’re interested in seeing how Koha works!).  After getting the basic integration for looking up works using Koha’s Z39.50 server and finding out item holdings/availability working last week, the next thing to tackle was to get loan histories out of Koha.

We use loan histories in LORLS’s purchase predictor code. We need to be able to grab an XML feed of both current and old loan issues, which is then used to work out what the peak number of concurrent loans have been made for an item and thus whether it has had sufficient demand to warrant purchasing additional copies.

For the current loans we need to know the date and time they were issued and for old issues we want both the issue date/time and the return date/time. For both current and old loan issues we also want to know the type (status) of the item (“long_loan”, “short_loan”, etc) and which department the borrower of the loan came from. The latter is so that we can apportion purchasing costs between different departments for the cases where multiple modules include the same books on their reading lists.

The item status is fairly easy to do in Koha – we’d already created item types in Koha and these can easily be mapped in the Perl code that implements the XML API into the long_loan, short_loan, week_loan, etc status format our purchase predictor code already expects.  Indeed if we wanted to we could make the items types in Koha just be “long_loan”, “short_loan” and “week_loan” so no mapping would be required, but a mapping function adds a bit of flexibility.

The borrowers’ department is a bit more involved.  It appears that in Koha this would be an “extended attribute” which needs to be enabled (it doesn’t appear to be on by default).  I created an extended borrower attribute type of called DEPT, and then entered some of Loughborough’s department codes as a controlled vocabulary for it.  In real life these would have to be slipped into Koha as part of a regularly (probably daily) borrower upload from our central reservation systems, which is roughly how we do it with our production Aleph LMS.  In our test environment I just added the extended attribute value manual to a couple of test users so that we could play with the code.

At the end of this posting you’ll find the resulting Perl code for creating this simple XML feed, which Koha sites might find handy even if they don’t use LORLS. One interesting thing to note in Koha is that the isbn field of the biblioitems table appears to contain more than one ISBN, separated by white space and vertical bar character (” | “). This means that you need to do a “like” match on the ISBN. This was a little unexpected and took me a while to track down what was wrong with my SQL when I had a simple “=” rather than a “like” in the select statements! The separation of biolios (works), biblioitems (manifestations) and items (er, items) is nicely done though.


#!/usr/bin/perl

use strict;
use lib '/usr/share/koha/lib';
use CGI;
use DBI;
use C4::Context;

$| = 1;
my $q = new CGI;
my $isbn = $q->param('isbn');
my $dept_code = $q->param('dept_code');
my $return_period = $q->param('return_period') || 365;

print STDOUT "Content-type: text/xmlnn";
print STDOUT "<loan_history>n";

my $status_map = {
    'BK' => 'long_loan',
    'SL BOOK' => 'short loan',
    'WL BOOK' => 'week loan',
    'REFBOOK' => 'reference',
};

my $dbh = C4::Context->dbh;
my $total_number = 0;

my $sql =
    'select items.itemnumber, issues.issuedate, biblioitems.itemtype, ' .
    '       borrower_attributes.attribute ' .
    'from biblioitems, items, issues, borrowers, borrower_attributes, ' .
    '     borrower_attribute_types ' .
    'where biblioitems.isbn like ' . $dbh->quote("%$isbn%") . ' and ' .
    '      biblioitems.biblioitemnumber = items.biblioitemnumber and ' .
    '      items.itemnumber = issues.itemnumber and ' .
    '      issues.borrowernumber = borrowers.borrowernumber and ' .
    '      borrowers.borrowernumber = borrower_attributes.borrowernumber and '.
    '      borrower_attributes.code = borrower_attribute_types.code and ' .
    '      borrower_attribute_types.description = "Department" and ' .
    '      borrower_attributes.attribute = ' . $dbh->quote($dept_code);

my $currentloan = $dbh->prepare($sql);
$currentloan->execute;
while (my ($id, $issuedate, $status, $bor_type) = $currentloan->fetchrow_array) {
    $status = $status_map->{$status};
    print STDOUT "  <loan>n";
    print STDOUT "    <issue_date>$issuedate</issue_date>n";
    print STDOUT "    <status>$status</status>n";
    print STDOUT "    <dept_code>$bor_type</dept_code>n";
    print STDOUT "    <number>1</number>n";
    print STDOUT "    <current>Y<current>n";
    print STDOUT "  </loan>n";
$total_number++;
}
$currentloan->finish;

my($sec,$min,$hour,$mday,$mon,$year,$wday,$yday) = gmtime(time-($return_period * 24 * 60 * 60));
$year += 1900;
$mon++;
my $target_date = sprintf("%04d%02d%02d",$year,$mon,$mday);

$sql =
    'select items.itemnumber, old_issues.issuedate, old_issues.returndate, ' .
    '       biblioitems.itemtype, borrower_attributes.attribute ' .
    'from biblioitems, items, old_issues, borrowers, borrower_attributes, ' .
    '     borrower_attribute_types ' .
    'where biblioitems.isbn like ' . $dbh->quote("%$isbn%") . ' and ' .
    '      biblioitems.biblioitemnumber = items.biblioitemnumber and ' .
    '      items.itemnumber = old_issues.itemnumber and ' .
    '      old_issues.returndate > ' . $dbh->quote($target_date) . ' and ' .
    '      old_issues.borrowernumber = borrowers.borrowernumber and ' .
    '      borrowers.borrowernumber = borrower_attributes.borrowernumber and '.
    '      borrower_attributes.code = borrower_attribute_types.code and ' .
    '      borrower_attribute_types.description = "Department" and ' .
    '      borrower_attributes.attribute = ' . $dbh->quote($dept_code);
my $pastloan = $dbh->prepare($sql);

$pastloan->execute;
while (my ($id, $issuedate, $return_date, $status, $bor_type) = $pastloan->fetchrow_array)
{
    $status = $status_map->{$status};
    print STDOUT "  <loan>n";
    print STDOUT "    <issue_date>$issuedate</issue_date>n";
    print STDOUT "    <return_date>$returndate</return_date>n";
    print STDOUT "    <status>$status</status>n";
    print STDOUT "    <dept_code>$bor_type</dept_code>n";
    print STDOUT "    <number>1</number>n";
    print STDOUT "    <current>N<current>n";
    print STDOUT "  </loan>n";
    $total_number++;
}
$pastloan->finish;

print STDOUT " <total_loan_count>$total_number</total_loan_count>n";

print STDOUT "</loan_history>n";

Go to Top