LUMP

Finding things that aren’t there

Jason noticed a bug in LUMP last week (just one I hear you cry? 🙂 ).  When non-admin/non-library staff created a structural unit for something like a book, there wouldn’t be any “Held by Library” flag data elements created, as these have a data type that lives in a data type group that the academics don’t have edit rights to.  This means that some of the backend scripts that were looking for works with this flag set to “N” were missing some records that don’t have the flag at all.

The long term fix for this was relatively easy – I just got EditStructuralUnit to create data elements filled in with the default value if such a thing exists in the data type table for any data type groups associated with the structural unit type of the structural unit being created (and breathe… honestly it makes more sense than it sounds like it does!).  It does this even if the user creating the structural unit doesn’t normally have edit rights to data types in some data type groups.  This should be quite safe to do as we’re creating a new structural unit so having defaulted data elements where possible should make library staffs’ lives easier.

However there was still a gotcha – we needed to go and find all the structural units that were missing their “Held by Library” data elements and create a suitable data element filled in with the default (in our case “N” as we assume we don’t hold works until proven otherwise).  Now I could have knocked out a Perl script with several selects and subqueries in it, but I decided to exercise the old SQL muscle and see if I could do it straight into the database, preferably in a single SQL statement.  So after some cogitating, here’s is the resulting SQL:

insert into data_element (structural_unit_id, data_type_id, value)
select su.id, dt.id, dt.default_value
from structural_unit as su
inner join data_type_group as dtg
on su.structural_unit_type_id = dtg.structural_unit_type_id
inner join data_type as dt
on dt.data_type_group_id = dtg.id
and dt.name = "Held by Library"
left outer join data_element as de
on de.data_type_id = dt.id
and de.structural_unit_id = su.id
where su.structural_unit_type_id in (1)
and de.id is null;

Hmm, that’s some serious joining going on there!  The basic idea is that we’re doing normal inner joins to link structural units, data type groups and data types together, and then a left outer join to spot data_elements.  As the left outer join has the data element columns set to NULL where the data element doesn’t match in a row, we use that in the where clause to limit the output to just the missing structural unit and data type information (plus a limitation on the types of structural unit we’re interested in – in this case just books).  This is then used directly in an insert.

Phew!  Seems to work though – we found 434 books on the live system that this corrects.  Relatively quick as well – less than 20 seconds on our production database. If anyone can see any better way to do this (or issues with this one!) please leave a comment.

Still, similar SQL can now be used on other structural unit types and other data types as well.  But I think I need a walk round the library first… 😉

Speeding up liststats

The LibrarianCentre contains a script called liststats that trawls back though editing events in the LUMP database and tries to determine how many lists in each department have been edited by which types of user.  Our library staff use the resulting statistics for a variety of tasks, such as distributing workload amongst teams and planning future loads.

The trouble is that liststats can be a bit slow, especially on a busy production database.  Today I’ve been doing some tweaking to speed it up, and got some pretty good results (mostly by creating some new methods to find the maximum priority a user has, either associated with a particular SU ID or in general).

All well and good, but it still takes some time to run so I decided to be flash and put a progress bar on the page whilst the script is running.  Hmmm… but the liststats script is just a plain old bog standard CGI script written in Perl, rather than one of Jason’s whizzy AJAX Javascript jobs.  A bit of hacking reveals that you can run Javascript whilst a page is loading though… so you can do a bit of sneaky Javascript and CSS to give us a nice graphical progress bar (with a textual incrementing percentage completed for the folk with no images/CSS/screen readers).  Cool – worked fine on our test server.

Then I moved it to our live server and all my progress bar loveliness disappeared.  What?!  Turns out that whilst Apache 2.x does unbuffered output by default (unlike some of the older Apache 1.x web servers of yesteryear that needed nph- scripts to do it), if you’ve got on the fly compression turned on the buffering is turned back on.  Rats.  Luckily there’s a handy way of turning it back off for particular scripts (so I don’t cock up Jason’s AJAX calls that benefit from compression, espeically on IE with its limited connection limit):

SetEnvIfNoCase Request_URI liststats.test no-gzip dont-vary

This just needs to sit somewhere appropriate in the HTTPd config files – on our CentOS box its in the included ssl.conf config file fragment.  Everything works again – hoorah!

Perl ZOOM issues

We’re getting to the point at Loughborough where we’re considering “going live” early next year with LUMP, replacing the existing LORLSv5 install that we have as our current production reading list system.  As such, we’ve just spun up a new virtual server today, to do a test LUMP install on it.  This machine has a fresh CentOS installation on it, so needs all the Perl modules loaded.  As we use Net::Z3950::ZOOM now, this was one of the modules installed (along with a current YAZ tool chain).

Once we’d got the basic LUMP/CLUMP code base installed on the machine I grabbed the existing LORLS database from the machine it resides on, plus the /usr/local/ReadingLists directory from the LORLSv5 install on there, in order to run the create_structures LUMP importer script.  Which then barfed, complaining that “Net::Z3950::RecordSyntax::USMARC” was a bareword that couldn’t be used with strict subs (LUMP, and LORLS before it, makes use of Perl’s “use strict” feature to sanity check code).

Hmm… odd – this problem hadn’t arisen before, and indeed the error appeared to be in the old LORLSv5 ReadingListsItem.pm module, not any of the LUMP code.  A bit of delving into the modules eventually turned up the solution: the new Net::Z3950::ZOOM doesn’t do backward compatibility too well.  There was a load of code in /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/Net/Z3950.pm that appeared to implement the old pre-ZOOM Net::Z3950 subroutines, but it was all commented out.  I realised that we’d not had this issue before because I’d run the importer on machines that already had LORLSv5 with an older copy of Net::Z3950 on them.

The “solution” was simply to uncomment all the sub routines under the Net::Z3950::RecordSyntax package.  The create_structures script doesn’t actually use any of the LORLSv5 Z3950 stuff anyway, so its not an issue – we just need the old LORLSv5 modules to install so that we can use them to access material in the old database.  I guess this goes to show the problems you can accrue when you reuse a namespace for a non-backwardly compatible API.

Adding and removing group members

Jason and I have been batting code back and forth for the last couple of weeks to provide an API and CLUMP interface to adding and removing people from usergroups.  We’ve gone through several iterations of designs and implementations and now have something that seems to be working OK on our development installation – hopefully soon to appear on the demo sandbox (unless Gary decides that he wants it to work in a different way! 😉 ).

With that more or less done, and some POD documentation tweaks (‘cos I’d been doing a bit too much cutting and pasting between scripts!), I can now go back to dealing with some of the backend management scripts.  The first two of these will be a simple link checker and a cron-able script to (re-)populate the “held by library” information.  Gary has a couple more reports that Library staff have asked for, but they might just require a couple of APIs to be produced to allow existing reporting systems to work (they weren’t part of the distributed LORLS code base – just local Loughborough reports).

Gary has presented the new system to several groups recently with mostly positive feedback.  We’ve just installed a link to this blog (and thus the demo sandbox system) into the live LORLS installation’s managelist script at Loughborough so that more academics will get a heads-up that something new is around the corner.

Is this thing on?

Since we released the LUMP sandbox on to an unsuspecting and innocent world a week or two ago, we’ve noticed that quite a few people have been playing with it.  Which is great – its just what we hoped folk would do.  Have a footle, take it for a spin round the block, kick the tyres, etc, etc.

However there’s been a resounding silence in the feedback front.  Either this means that:

  • folk are so awestruck by its ground breaking novelty that they don’t know what to say,
  • or it just works so nobody has any bug reports or enhancement suggestions,
  • or is so shoddy nobody thinks its worth bothering with,
  • or we released it at a really stoopid time when most University support stafff are (like us) preparing for the onslaught of the hoards.

If anyone who has played with the sandbox does have any feedback we’d love to hear it… even if its just ‘meh’… 🙂

Infrastructure for usergroup management

I’ve been doing a bit of backend infrastructure creation on LUMP today – a set of seven CGI scripts to help with managing usergroup memberships.  These are:

  • Usergroups4SU – provide a list of usergroups that are involved in ACLs for a given structural unit.
  • Members4Usergroup – provide a list of users that a members of a given usergroup.
  • UserGroupMembership – provide a list of usergroups that a user is a member of.
  • Editing/AddUser2Usergroup – add a user to a given usergroup.
  • Editing/RemoveUserFromUsergroup – remove a user from a given usergroup.
  • Editing/AddUsergroup – create a new usergroup.
  • Editing/RemoveUsergroup – remove an empty usergroup (ie one with no members).

Jason is going to take these in the next week and see if they are sufficient to allow CLUMP to provide a nice graphical interface for staff, librarians and administrators to add and remove users from user groups.  We’ve already got the Editing/EditACLs API CGI script though, so Jason can already tinker with the ability of usergroups to view and/or edit structural units.  We might find that we need some other API calls for admin usage, but this little lots should give us quite a bit of functionality.

Librarian Centre CGIs – departmental breakdowns

Today was a bit of a change of pace in LUMP development – it was time to whizz up a backroom script for the librarians, allowing them to see how many modules, reading lists and different types of items were used by departments in a particular year.  On the whole this was an “old skool” CGI script that doesn’t use the LUMP XML API – just straight calls to the underlying Perl modules that wrap round the LUMP database.  This is both quick to run and quick to code, and for a simple statistics script is fine (no need for flashy JavaScript for those really).  Gary is pondering other statistics scripts we might want to produce, both derived from the existing LORLS ones and also some new ideas.  He’s going to chat to some of our librarians about those first though – no point wasting time writing code to produce information that nobody is interested in!

Installer hits a sticky patch

You know I loved CPAN until I started writing the LUMP installer but now I’m not so keen!

It looks like our cunning plan of allowing people to install the required modules to run LUMP from CPAN using point-n-dribble web front end might have come unstuck for a couple of reasons.  The major one is having Perl modules that use XS to rely on C code/libraries/etc that might not actually be installed on the target machine.  For example we need to use the ZOOM Perl module for Z39.50 access.  This in turn makes use of the Yaz toolkit, which you either need to build from source or get an RPM for… and there aren’t any RPMs in the standard CentOS repos (you can pull CentOS RPMs from Index Data’s web site but not get them via yum).   Similar requirements for Expat, although at least that has yum’ed packages in CentOS.  I not noticed either of these because on my Debian Lenny based workstation there were already handy packages in place and the Perl modules that I’d been “needing” were pure Perl.   I only stumbled across the issue when Jason created a “sandbox” virtual machine running CentOS and I tried to run the installer on there.  Even if I could guess the package manager on use on the machine (yum, apt, etc) I’d need to ask the Admin for the system root password and then have to deal with the out of repo code such as Yaz.  At this point the law of diminishing returns kicked in and we decided we’d wasted enough time trying to make a user friendly installer and it wasn’t going to be easy to get round these problems.

Still, all is not lost: at least the installer code can spot which modules are missing so that you can install them by hand.  In fact I’ve left the code into install private CPAN modules – we might as well do as much as possible for the prospective Admin even if we do then need to tell them to break out the C compiler and build some support software.  And of course the installer can still do everything else once the required Perl modules are in place – create the DB, import test data, and retrieve and configure the various scripts required.

I also accidentally stumbled across another problem today: DBD::MySQL doesn’t allow column filters in the column_info() DBI method.  I was using that to check if the LUMP database schema on the target machine, if present, was out of date compared to the schema embedded in the installer.  This is a bit of a pain but relatively simple to code round.  The DBI manual page implies that if a DBD module can’t do filtering you might have to handle it yourself: DBD::MySQL just barfs with an error message so I’ve had to embed the call in eval() checks just to be on the safe side.

I’ve also just noticed that I’ve hard coded the Kerberos TGT realm!  Oops – we’ll need to change that, especially as we’re just moving ActiveDirectory trees here at Loughborough anyway!  We’ll probably also need to add in an override ability for folk with no AD or other Kerberized authentication infrastructure – that shouldn’t take too much work as its just a little tweak inside the LUMP.pm password validation method.  Hopefully in a week or so we’ll have little demo system on our sandbox VM that we can then let people play with – a bit like the demo system that we already run for LORLS v5 (our “groundhog day” install, that gets blown away every morning in the wee hours and replaced with a fresh copy… in fact I used the LORLSv5 to LUMP importer to convert that database into our new format for this new sandbox machine as another useful test).

Installer progress…

Writing a web driven installer for LUMP in Perl is proving to be, er, “interesting”.  Today has seen some useful progress and its probably worth noting some of the tricks used (mostly because they’re rather esoteric and bug hunting might be fun in a few years time when I’ve forgotten why I did what I did!).

The first thing I had to get going was bootstrapping the the required CPAN modules as documented in my previous post.  This is now working OK.  A few gotchas to note:

  • When you fork off a child process from a CGI script that you want to be “long lived” (ie outlive the parent process that has returned some HTML to the user), you need to close down STDOUT, STDIN and STDERR, otherwise Apache hangs around waiting for the child to finish.  This can actually be handy, as you can then reopen file handles with those names pointing elsewhere (such as to a session file) and record the output of other tools that the child calls upon (such as the very chatty CPAN::Shell).
  • CPAN::Shell doesn’t appear to return a useful error status, so programmatically you can’t tell if your module install worked or not (the verbose output is designed to tell a human).  To find out if a module is installed, you have to try to “use” that module again – once installed the use statement should work.
  • Talking of use statements its worth noting that these are best done inside eval()’s, as that allows you to capture errors (as in the case of a module that has failed to install).  You have to be careful whether you have compile or run time binding though, especially if you’re eval()ing stuff with variables in that you’re planning on instantiating during the run.  Perl -cw is your friend as usual by warning you about dodgy compile time issues.

Once I could successfully install CPAN modules in to a private LUMP modules directory, the next thing to consider was the database.  The installer CGI script asked the user for the database name, database account and database password to use and then checks to see if it can gain access.  For most noobs this won’t work as they won’t have a database/account/password, so the CGI script traps that error and asks if they’d like it to create the database for them.  This requires them supplying the database root password and then (assuming its right) creates the database and grants SELECT/INSERT rights to the named user identified by the given password.

The CGI script then checks that this user can connect to the database – if they can’t then something has gone wrong with the basic creation and we have to tell them to fiddle by hand.  Assuming that is OK, it then changes back to the database root user to do the schema creation (as the user only has SELECT/INSERT privs).  This however called for another hack: we’ve kept the database schema and index definitions in SQL text files.  During development we’ve just source’d these in with the mysql command line client – but this won’t work from Perl’s DBI interface and I don’t want to rely on finding the mysql command line client and system()’ing out to it.

So my next trick was another Perl script: the installer_maker.  This is a simple filter script that reads a “template” installer script in and writes out a proper installer with a couple of lines in the template replaced by some Perl array structures – one for the table schema and one for the indexes.  These arrays simply contain one SQL statement in each element that the installer_maker has ripped out of the LUMP schema and indexes SQL text files.  Bingo!  We can now carry on tweaking the schema/indexes in the files we’re used to in development, whilst not having to remember to do lots of code twiddling to keep the installer in sync – we just need to run installer_maker before making a distribution.  Hopefully that should result in less chance of our dev schemas getting out of step with released production schemas and the bugs that could result from that.

So its looking promising now, and from a user perspective is probably far friendly than the command line Perl installer from LORLS.  The next thing to write is a routine in the installer that can check the schema and indexes in a live database against the schema/indexes we have now got embedded in the installer.  If they differ in some way we need to tell the admin running the installer and let them decide whether to ignore the difference (bad idea probably), tweak it by hand or (preferably) let the installer tweak their schema/indexes.  Hopefully in the long term this will be a useful place to hang LUMP schema/index upgrades on – folk could use the installer not only to load new versions of the Perl code but also update their database structure.  That’ll need some serious testing though… 🙂

LORLS’s Installer

We’re getting to the point with LUMP and CLUMP where Jason and I are thinking about installers.  If we were just going to use the code at Loughborough this would be a moot point – we’d just copy the various directories around and tweak under the hood to get the thing to work as we want, just as we have done during development.

However its not going to just be us that use it (or at least hopefully it won’t be!) so we need to think of a nice way for people to install it.  The previous versions of LORLS have a Perl command line script that I wrote years ago that asks lots of questions and then does its funky thang.  It works, but it could be friendlier, especially for the newbie admins.  I’ve also seen a number of PHP based packages that allow you to do most of the installation via a web browser which is rather nice for the new admins, so I thought, “Hey, why not make an installer CGI script that knows very little about the system other than there is Perl there, a MySQL database server is running, the CGI.pm module (which has been part of the Perl distribution since 5.4 so should be there already on most machines installed in the 21st century) and is an executable CGI script?”  Why not indeed…

Obviously the prospective LUMP admin needs to install (or have installed for him) Perl, MySQL and a webserver configured to execute CGI scripts in the directory that we tell them to unpack LUMP into, but once that’s done surely we can then check to make sure that all the other modules are present, install them if they aren’t, set up the MySQL database ready for LUMP to use and then configure all the LUMP scripts so that they are ready to go, all with a nice point and drool web interface?

Checking if a Perl module is available is relatively easy, thanks to the eval { } function in Perl.  For example, say we want to check if Data::Dumper is installed.  That can be done using some code such as:

eval { use Data::Dumper; };
if($@) {
  # Module missing
} else {
  # Module present, and available for use
}

Shimples!

However things start to get “interesting” if the module isn’t installed.  I thought this would be easy, as the Comprehensive Perl Archive Network (CPAN) has a nice module that I’ve used for years to interactively install and update Perl modules with – things like:

perl -MCPAN -e install Data::Dumper

It has a programmatic option as well as an interactive side, so we’re sorted right?  Well no, we’re not.  Unfortunately the programmatic side really just generates the stream of stuff you see when you run the interactive side.  If you include something like:

my $result = CPAN::Shell->install('Data::Dumper');

in your CGI script, eventually you’ll get a result in the web browser of a load of unformated raw text from this command interspersed with your active HTML.  The $result variable on the other hand will stay completely empty, with no indication as to whether the installation has worked or not.   Pants – not what we want here.

The long and short of it is, to get round this “feature” in CPAN::Shell it seems that you have to do a bit of fork() action.  In other words folk off a child process to run the CPAN::Shell method in and then, back in the parent, capture its STDOUT stream into a variable which can then be scanned with a regexp or two looking for signs of success in the output text.  Not terribly clean but it works.

There’s another “gotcha” as well: the web server is unlikely to be running as root (or at least it shouldn’t be!) and so the CGI script can’t install Perl modules into the system library directories.  This is a more minor pain: you can tell CPAN::Shell that it should do a local installation to a place that the user executing it can write to.  So that’s another requirement for running this CGI script: create a directory that’s readable and writable by the user running the web server (usually called something like apache or http) but which isn’t part of the web server document root.  In other words, if the web server document root is /var/www/html, we might want to put this LUMP specific CPAN directory tree in /var/ww/LUMPCPAN where it can’t be seen by web browsers.  You have to hack up a MyConfig.pm to put in this directory and then point @INC and $ENV{‘PERL5LIBS’} towards it, but that can be done automagically by the CGI installer script once the directory exists.

Now some readers (we do have readers, right?) might be wondering why I don’t use one of the fancy pants CPAN modules such local::libs or Module::Install to do this rather than tackling CPAN::Shell head on.  Well its a chicken and egg problem really: I wanted to have the minimum requirements for the installer script to run and, if I’d have asked the user to install a load of libraries and modules to make using CPAN easier I might as well have just given them a list of modules to install.  Which actually I have done anyway, just in case they want to install them system wide and/or not trut my installer script to get them right.  But nevertheless I’d like to give the newbies (and also folk who have sluggish server admins if they don’t run their own boxes) the option of installing via the CGI installer script.  More work for me, but hopefully less for them.

So, its a bit of kludge at the moment but it seems to be the way to go unless anyone else can come up with a better scheme.  Once I’m happy it can get the required modules in place, my next trick will be to ask for database server details (database name, username, password, server, port) and then check if the database already exists, or create a fresh one.  In the former case I’m also planning on checking the schema in the database against the one in this LUMP distribution and then offer to update it if needs be, and allow fresh databases to either be empty, have some simple test data in them or copy data from another database.  Hopefully I can also bolt the importer from older LORLS versions in here so that there’s no command line interaction required at all.  With a bit of luck all of those should be alot less hassle than CPAN::Shell has proved to be.

I hope… 🙂

Go to Top