LUMP

Speeding things up (or at least trying to)

Jason has been doing a load of work in the last few months on getting the CLUMP AJAX and its looking rather nice. However Gary and Jason had decided that on longer reading list the performance wasn’t good enough. A 635 item list took over 3 minutes to completely render when Jason pulled the Reading List SU and then did separate XML API calls to retrieve each individual item (the advantage of this over just using a GetStructuralUnit API call with multiple levels is that the users could be presented with some of the data asynchronously as it arrived, rather than having to wait for the back end to process the whole lot).

So the question was: could it be made faster? One option was trying to create a single “mega” SQL select to get the data but that could almost immediately be discounted as we pull different bits of data from different tables for different things in GetStructuralUnit (ie the child SU info and SU/SUT basic info couldn’t easily be munged in with the data element/data type/data type group stuff). So we did two separate selects to get the basic data, ignoring the ACL stuff. These were both subsecond responses on the MySQL server.

Now obviously turning the SQL results into XML has a bit of an overhead, as does the client-server comms, but nowhere near enough to cause this slow down. This pointed the finger firmly in the direction of the ACLs. Every time we pull an SU (or its children) we need to do a load of ACL checks to make sure that the user requesting the data is allowed to see/edit the SU and/or the data type group. When we added ACLs back into the new “fast” XML API the 635 element list took just under two minutes to render. So we’d shaved a third off the time by crafting the SQL rather than using the Perl objects behind the XML API, but it was still a bit slow.

Gary then came up with a bright idea: why not allow FastGetStructuralUnit (the new whizzy version of the GetStructuralUnit XML API CGI script) to accept more than one SU ID at once? That way Jason’s CLUMP AJAX front end could request the outer reading list SU quickly, and then fill in the items but do them in blocks of several at once. We implemented this and had a play around with different numbers of items in the blocks. Five seemed quite good – this was fast enought to fill in the first screenful to keep the user occupied and managed to get the whole 635 item list rendered in the browser in just over a minute – two thirds of the original time.

Jason can now also try out more advanced ideas in the future, such as dynamically altering the number of items requested in the blocks based on response time and whether the user is scroll down to them or not. With lists under 100 items we’re getting a sub-10 second rendering time, so that’s hopefully going to be fast enough for the majority of users… and may even encourage some academics with long and unwieldy reading lists to split them up in to smaller sub lists.

More ACL fun

After fiddling with ACLs for the last few weeks we had to reload the access_control_list table with the new style ACLs. Last week I’d written a “fixacls” perl script to do this and left it running. It was still running a week later when I came in this morning, and was only about half way through the >88K SU’s we’ve got. A quick back of the envelope calculation showed that with the new ACL format we’d be looking at north of 2 million ACLs for this many SUs.

So, time to find a way of speeding it up. I decided to try something I’d seen phpBB do: write all the imports for a table into a single SQL INSERT statement. This should then be able to read all data in and generate the index entries all in one go, which should be much faster than doing it row by row in individual SQL statements.

The ACL generation script took just over 6 hours (I didn’t time it exactly unfortunately, but it was ticking along at about 300000 ACLs per hour, with just over 2 million ACLs in total for the 88882 SUs we’ve got).

Reading this into the database was actually quite quick:

mysql> source /home/cojpk/LUMP/trunk/ImportScripts/acl.sql
Query OK, 2266017 rows affected (51.98 sec)
Records: 2266017 Duplicates: 0 Warnings: 0

Well, it was once I’d upped the max_packet_size in /etc/my.cnf to 100MB from 1MB! Looking promising now. A similar technique might be useful for a revamp of the LORLS import script – certainly the ACL fixup can now be run as part of that to cut down on time required. Still need to tweak the fixacls script to deal with LORLS reading lists that were marked as unpublished, but that should be a quick and easy hack.

Access Control Lists

Been head scratching about access control lists, which have proved more complicated than originally anticipated. The trouble comes from working out how to do inheritence (so that we don’t need thousands of similar ACLs differing only in their SUID) and also how to set the initial ACLs for a SU that has been newly created.

Current thinking is as follows:

The existing access_control_list table will be left as is, but will be used in a slightly different way. To find out if a user has rights to something, we’ll first search through the table for all rows that contain a user group that the user is in that has the maximum priority. The guest usergroup has a priority of 0, general registered users 1, module support staff 170, module tutors 180, module library assistance 190, module librarians 200 and sysadmins 255.

If no rows match, game over, the user doesn’t have any sort of access. If rows are returned, we then look at the SUID field, and potentially the DTG_ID field if we’ve had one specified. Rows with no SUID or DTG_ID field are the lowest prescidence, then ones with just a SUID provided, then ones with SUID and DTIG_ID at the top. At this point we can then look at the view and edit fields to see what folk are allowed to do with them.

We might also need to allow folk to edit ACLs. We’ll let them edit ACLs of any group with a lower priority than the one that they are in and that mentions a SUID and/or DTG_ID that they have edit rights to. The only exception is Sysadmins – we can edit our own ACLs and can also edit ACLs that have a NULL (ie wildcard) SUID or DTG_ID. We will also disallow new ACLs to be created that have the same <usergroup_id, su_id, dtg_id> triple so that we don’t have two rules with different edit/view options at the end of the access validation algorithm.

When a new SU is created, we’ll need to add some rights to it. These initial rights will have user groups inherited from the parent SU, but this is where things get tricky: as we’ve discovered we can’t just copy the ACLs, because child SUs will in general be of a different SUT than their parents.

To get round this we’ve proposed two new tables. The first of these is “acl_defaults”. This links user group priorities with SUT_IDs, DTG_IDs and default “view” and “edit” settings. The second table is less important and more for UI “gloss” when creating/editing default ACLs – it is merely a list of priorities (as its ID field) and a name – so that we know that (for example) priority 170 is assigned to user groups for Support Staff.

The process when a new SU is created is as follows: for each usergroup that has an ACL in the parent SU take that usergroup’s priority field and then look up the row(s) in the acl_defaults table that match that combined with the SUT_ID for the new SU. These rows are then used to create new rows in the access_control_list table with the specific SUID for the new SU filled in (and if specified in acl_defaults, the value for DTG_ID as well).

Client Performance

In the last month we’ve been getting to grips with client side coding. This has thrown up a few XML API calls that we decided we wanted or needed tweaking (such as checking authentications without returning anything from the database, listing all the structural unit types available and allowing substring searches in FindSUID). It had also given us pause for thought on the performance of the system.

The old LORLS database was pretty lean and mean: the front end code knows about the structure of the data in the database and there’s relatively few tables to have to handle. LUMP is more complex and thus more searches have to be done. Also by having an XML API we’re that bit more removed and there’s more middle ware code to run. We could just say “hang it” and ignore the XML API for a CGI based client which might speed things up. However we’ve been trying to “eat our own dogfood” and use the APIs we’re making available for others.

Some of the speed hacks for the imports won’t work with the clients – for example the CGI scripts that implement the XML API are short lived so the caching hacks wouldn’t work, even if the clients all did tend to ask for the same things (which will in fact be the case for things like the Institutation, Faculty, Department, etc information). One avenue that we could persue to help with this is mod_perl in Apache, so that the CGI scripts are turned into much longer lived mod_perl scripts that could then benefit from caching.

We’ve currently ended up with a sort of three pronged approach to clients code:

  • Jimll is writing a CGI based LUMP client that makes minimal use of JavaScript (mostly for the Google Book Search fancy pants bit really),
  • Jason has a basic PHP Moodle client to demonstrate embeddablity,
  • Jason is also writing a AJAXy client side tool.

All of these use the XML API. I guess we should also have a non-XML API client as a bench mark as well – the CGI based LUMP client could do that relaively easily in fact. Something to think about if performance really does turn out to be a dog.

We’ve also been considering the usefulness of the Display Format stuff held in the database. The CGI based LUMP client and the PHP Moodle client both use this. However the AJAX client retrieves raw XML documents from the server and then renders them itself. This might be a bit faster, but it does mean that the client becomes tied to the structure of data in the database (ie add a new Structural Unit Type and you might need to dick about with your client code(s) as well).

Speeding up the imports

A while since the last update and that’s mostly because we’ve been banging our heads against a speed issue for the import of the old LORLSv5 reading lists into LUMP.

The first cut of the importer seemed to work but had barfed due to lack of disc space on my workstation before completing the run. For dev/testing the rest of the API that was fine though as we had enough data to play with. It was only when we installed a new virtual hosting server and I created myself a dedicated test virtual server with oodles of RAM and disc space that we discovered that the import would work… but it would take around two weeks to do four years worth of data. Ah, not good. Especially as the virtual host is supposed to be relatively big and fast (OK its a virtual server so we can’t count spindles directly as the filesystem is stuffed inside another file on the RAID array on the host, but its should still be plenty fast enough for LUMP to run, otherwise folk with older server hardware are going to be stuffed).

We’ve tried a number of different options to help sort this out. These have included:

Search caching in Perl
Tweaking the BaseSQL module to allow a Perl hash based cache to be turned on and off (and the status of caching checked). This is then used by some of the higher layer modules that encapsulate a few of the tables (StructuralUnit, DataElement, DataType, DataTypeGroup) to see if a search matches a previous search and, if caching is turned on, returns the results immediately from the Perl hash without hitting the database. Any updates on the table in question invalidate the cache. Reading the cached copy is much faster than accessing the database and so this can be a big win, especially on tables where there are relatively infrequent updates. Unforunately we do quite a bit of updating on StructuralUnit and DataElement tables.
A reload() method
Quite a lot of the time we create new Perl objects on a database table to do (for example) a search and then later have to do another new() on the same object to instantiate it with an existing row from the table (based on the id field). Every new() method reinterrogates the database to find out the fields for the table concerned and then recreates the Perl object from scratch. However the fields are unlikely to change from call to call (certainly during an import) so this is just wasted time. A reload() method has been added so that you can instantiate the object from a known row in the database via the id field without having the whole Perl object regenerated or the data queried for the fields available. This results in a slight but noticeable saving.
Adding noatime and nodiratime mount options
Normally on a Linux ext3 filesystem (which is what we’re running), the access time of files and directories are updated each time they are accessed. Of course that means that every SQL SELECT is effectively also a write on the filesytems. The noatime and nodirtime directives to mount (slipped into /etc/fstab) turn this behaviour off. You don’t even have to reboot for it to come into effect – the mount -oremount / command is your friend! This should remove another disk related bottleneck during the import.
Tweaking the ACL CopyRights() method
Every time a row is inserted into the structural_unit table for a new SU, several rows get stuck into the data_element and access_control_list tables. Looking at the latter, during the import many of these rows are created as a result of the CopyRights() method on the AccessControlList Perl object. This method allows several fancy options such as overwriting existing ACLs and cascading ACLs from a parent to all its children. Neither of these directly apply in the case of building a new SU and copying the access rights from its direct parent, yet we still had to do a load of SQL to support them. Therefore a new parameter called “new” was added to the method to indicated that the ACLs were being copied for a new SU, which allowed some short cutting. One part of this sort cutting was to use a single INSERT INTO...SELECT FROM... SQL construct. This should be fast because it is just copying internally in the database engine (using a temporary table as both the source and target tables are the same in our case) and doesn’t need to have results send to/from the Perl script. This appears to be quite a big win – performance with this and the previous two tweaks now hits 1000+ SUs and associated data being created every 10 minutes or so.

Whilst these tweaks look to be making bulk importing data from LORLSv5 more manageable, it remains to be seen if they ensure that the performance of the interactive editing and access acceptable. Hopefully now I can get back to making a non-Moodle front end and we’ll see!

Editing API

With EditDataElement under our belt, its time to turn our thoughts to more of the editing/creating side of the system. Gary, Jason and I had a sit down thinking session this afternoon and came up with four things we need an API for:

  1. Given a SUT ID (or a SU ID from which we can find a SUT ID), what SUTs can be children? We need this in order to produce a drop down list of possible SUTs that can be added under an SU in a editor,
  2. Given a SUT ID, return an XML structure describing the data type groups and data types that can be under it. This is needed so that an editor can work out what should be in the form, what format it has, etc,
  3. Create an SU with a given SUT ID (and possibly use the same API for editing an SU with a given SU ID?). This needs to be told the parent SU ID and check that the requested SUT ID is a valid child, and also that the user has permission to create children. If everything is OK it generates a new SU of the given type and links it to the parent SU in structural_unit_link table. It then has to copy the ACLs from the parent SU to the new SU before returning the new SU ID.
  4. An API for managing ACLs. Needs to allow new ACL lines to be added and existing ones changed and/or deleted. We had a talk about how we envisage ACLs working and it seems to be OK as far as we can see: ACLs on new SUs will by default inherit from the parent SU so that the same people have the same rights (ie if you can view or edit a reading list, you’ll be able view or edit a book attached to it). Users can remove ACLs for groups of the same rank or lower than they are (so academics can’t remove the Sysadmin or Departmental Admin groups rights for example).

This evening a first cut at the first two of those has been knocked out. This should let Jason play with front end form ideas for the editors, even if he can’t actually create new SUs yet. Creating SUs shouldn’t be a big deal (its very similar to EditDataElement) but the ACL management API is going to have to be carefully thought out, especially if we might want to add new capabilities in the future (for example a “can_create_children” ACL, so that you can edit an existing SU but not add structure beneath it, and maybe a “can_delete” one as well so that Academics can allow research students to tweak typos in their lists but not add or remove items). Another suggestion from Gary was a “can_publish” ACL type so that only specified folk can authorise the publication/unpublication of an SU (and its children).

Talking of deleting, we also tweaked the structural_unit table today by added two new attributes: deleted and status. The deleted attribute indicates whether an SU has been, well, deleted. We don’t want to actually do a database delete as academics and librarians have in the past had “ID10T” moments and deleted stuff they shouldn’t, and getting it back from backups is a pain. By having a simple flag we can allow sysadmins to “undelete” easily – maybe with a periodic backend script that really deletes SUs that were flagged as deleted several months ago.

The status attribute allows us to flag the publication status of the SU. By default this will start as “draft” and so we can ensure that student facing front ends don’t see it. When ready it can be changed to “published” which allows it to be viewed by guests normally (subject to ACLs of course). Lastly there is a “suppressed” status that is intended to allow published SUs to be temporarily unpublished – for example if a module is not running for one year but will need to reappear the next. Not entirely sure if the status attribute isn’t just replicating what we can do with ACLs though – I’ll need to chew that one over with the lads.

Surviving Database problems

The importation of LORLS v5 code has hit a snag: the object oriented nature of the Perl code created large numbers of connections and would eventually, after processing a largish number of reading lists, blow up because the MySQL database would not handle any more connections. A way round this is to replace the Perl DBI connect() method with a similar call to Perl’s DBIx modules – DBIx supports reconnections.

Some cunning tweaking of the BaseSQL init() method is also required so that we can keep on reusing the same database connection over and over again, as there’s little point generating a new one for each Perl module in use. BaseSQL now use a cunning Perl hack based on the behavour of unnamed embedded subroutines (ie closures) and variable scoping/referencing. This allows a connection to be set up if one exists, but also allows the database handle for the connection to be shared amongst all instances of the Perl modules that inherit from BaseSQL in a process. This means that the LUMP side of things only uses at most one MySQL database connection, and will reconnect if the database goes away (it’ll try 500 times before it gives up, so that covers the database being restarted during log rotations for example, which is what originally highlighted this problem).

However all is not rosy in the garden of SQL: the two old LORLSv5 modules that are needed to read the old reading lists can also generate large numbers of connections. I’m experimenting with closing the handles they create as soon as I’ve issued a new() method call and then telling them to use a hand crafted DBIx connection that is already set up. Seems to work but I keep finding more bits where it sets up new connections unexpectedly – not helped by the recursive nature of the build_structures LUMP import script. Aaggghhhh! :-)

Creating Indexes

Whilst writing a little dev script to nullify Moodle IDs for Jason, I realised that things could be a bit slow sometimes searching for SUs when the system had a large number of reading lists in it. Once again I’d made the school boy error of forgetting to create indexes in the DB schema for the foreign keys in the various tables. To that end the schema now has indexes on any ID fields in any tables, plus a couple of obvious ones we might want to search on. We’ll probably have to add more in the future, especially on large tables (structural_unit, data_element, access_control_list, user, usergroup for example).

Also been trying to track down uninitialised variables in the build_structures LORLSv4 import script – it works at the moment but looks messy and I’d rather have properly error checking code.

The EditDataElement API and Moodle IDs

Today we added the first web API call that could actually change things in the LUMP database (as opposed to just reading from it). The driver for this came from integration with Moodle. Today we’d got a LUMP reading list to display from within Moodle, but we needed to be able to link a Moodle ID to a reading list SU. The resulting EditDataElement API call allows some flexibility in how the data element in question is specified. It could be by the data element’s ID, in which case the value is simply updated. However the client code might not know the data element ID or it might require a new one, so it can create/edit data elements by specifying the required new value, the ID of the SU that it is within and the data type name of the data element.

One subtle finese required is that some DTs are not repeatable but others are. For repeatable DTs the SU ID and DT name can be used to simply create a new DE. However for non repeatable DTs, a flag can be provided to indicate whether to replace any existing value. If a DE of the required DT already exists in this SU and this flag is not set to ‘Y’, a permission denied error is returned.

Tweaking output when rendering

Had to slightly change the output XML when using GetStructuralUnit CGI script API. Originally if asked for HTML the code would just generate the HTML (or whatever format) for each child to the required depth level and just concatenate it. However this meant that higher levels couldn’t make formatting decisions based on details of lower levels – for example reading list SUs need to know when to turn off/on list tags in HTML depend on if children are notes or book/journal/article/etc SUs. The tweak was to always have the child return a structure:

$child->{su}->{render} = formatted output $child->{su}->{id} = child SU's IDThe child SU's ID field lets the parent's code make more intelligent rendering decisions.

Go to Top