Identifying multibyte UTF-8 characters in PostgreSQL

This afternoon I had to find a quick way to identify which rows in a PostgreSQL table had multibyte UTF-8 characters in it.  Luckily PostgreSQL supports a number of string functions one of which is char_length, which returns the number of characters in a string.  Another one is octet_length which returns the number of bytes in a string.  For standard ASCII strings these will be the same but for any strings containing multibyte UTF-8 characters, these will differ.  Using these functions I ended up with some SQL based on the following query

SELECT id, text_value FROM metadatavalue WHERE char_length(text_value)!=octet_length(text_value)

Talking to Nessus 6 API

Nessus 6 has an exciting new API…. which means that systems that rely on the old XML API and things like Perl modules to encapsulate it are going to take a bit of work to port to the brave new world.  My chum Niraj is looking at doing that here, but wanted an example of the new API in use that he could build on.  Luckily I’ve done more Perl API hacking with JSON than a person should admit to, so it didn’t take long to knock one up for him.  I thought I’d share it here for other people’s edification (and also to stop Gary, my boss, saying that I don’t blog enough).

Hopefully the comments will help explain what is going on, but the over view is that the code is in two parts really. The first part creates a session for a given username and password. This gets given a token by the Nessus API server to say, “Yes, you’ve logged in”. This token is then used in a Custom HTTP Header in the second part of the example to request a list of scans. These come back in JSON format so we turn them into a Perl data structure using the JSON.pm Perl module and then do whatever funky thing we intended to do with the data. I hope that’s clear! 🙂

#!/usr/bin/perl

use strict;
use LWP;
use LWP::UserAgent;
use JSON;
use Data::Dumper;

# This line is a hack to deal with the duff SSL certificates on the test server
BEGIN { $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0 }

# Where we talk to
my $apibaseurl = 'https://nessus.example.com:8034/';

# Get the username and password to use
my $username = shift || 'mrtester';
my $password = shift || 'testytesting123';

# Create a user agent (web browser) to talk to the Nessus server
my $ua = LWP::UserAgent->new;
$ua->agent("LboroScanHack/0.1");

# Create the POST request that encodes the username and password
my $req = HTTP::Request->new(POST => $apibaseurl . 'session');
$req->content_type('application/x-www-form-urlencoded');
$req->content("username=$username&password=$password");

# Make the request of the Nessus server
my $res = $ua->request($req);

# See if it worked and if so, get the session token
my $NessusToken = '';
if ($res->is_success) {
my $result = from_json($res->content);
$NessusToken = $result->{'token'};
} else {
print $res->status_line, "\n";
print $res->content;
print "\n";
exit;
}
print "Got a token of $NessusToken\n";

# OK, at this point we have a valid session token. We can now use this
# to make more requests of the Nessus server. Lets get a list of scans as
# an example. For these we're going to need to create a custom HTTP Header
# in the request called X-Cookie that includes the token we've just got.

my $h = new HTTP::Headers;
$h->header('X-Cookie' => "token=$NessusToken;");

# Now create a new HTTP request object for the method/URL we want that
# includes this custom header
$req = HTTP::Request->new('GET' , $apibaseurl . 'scans', $h);
$req->content_type('application/json; charset=UTF-8');

# Make the request to the server including the token.
$res = $ua->request($req);

# Didn't work? Explain why and exit.
if (!$res->is_success) {
warn $res->status_line, "\n";
warn $res->content . "\n";
exit;
}

# Still going so the request worked - convert the JSON content into a Perl
# data structure.
my $scandata = from_json($res->content);

# Print it out
print Dumper($scandata);

# Do some funky stuff with the data
foreach my $thisfolder (@{$scandata->{'folders'}}) {
print "$thisfolder->{'name'} is a $thisfolder->{'type'} with ID $thisfolder->{'id'}\n";
}

Timetables

Here at Middleware Towers, we were asked to look into providing students with access to their timetable, in a variety of methods. We have already blogged about inputting the data into google calendars, but for a variety of other applications we wanted a method that was lightweight and portable. We wanted to avoid being too heavily tied into the University’s systems, so we could give the methodology away without having to worry about bespoke parts.

JSON

Readers of our blog (you must be out there somewhere) will know we like JSON, we like it a lot, here’s why:

(JavaScript Object Notation) is a lightweight format based on the ECMA-262 standard. It is easy for humans to read and write. It is easy for machines to parse and generate and uses conventions that programmers using the “C” family of languages will be comfortable with. These properties combined with its universal data structures. make it an ideal data-interchange language.

Taking the module calendars we are already creating for our students google calendars, it was relatively trivial to also create JSON files of the same data (one per module, created by a script, controlled via Cron). These files would be in the following format:

[
   {
      "ROOMID_STR" : Harry Palmer 101,
      "MODULEID_STR" : "ABC001",
      "MODULENAME_STR" : "Advanced Converter Cars",
      "SLOTID" : "721",
      "STARTTIME" : "11:00",
      "ENDTIME" : "13:00",
      "LECTURER_STR" : "Professor Pat Pending",
      "DATECHANGED" : "201402140944",
      "MAPDATE" : "20-MAR-15"
   }
]

A further file per module would also be created, which would list those students registered on it. It takes the following format:

   {
      "j.bloggs@student." : [
      "ABC001",
      "ABC001 (B)"
   ]
}

The (B) indicates a cohort of the module, this is common practice if the module membership is large, it splits the students into smaller groups for such things as tutorials. We wanted our timetables to be fully personalised, so we also created JSON files for any cohorts a module may have.

Now we had the data in a format we could access and manipulate, we needed to find something which could present it to our students in a manner they were familiar with.

Fullcalendar

Fullcalendar is a jQuery plugin from MIT (who seem to produce an almost unending supply of these sorts of thing), It provides a well documented, AJAX based google calendar like environment. Its API is rich and it is easy to personalise the output. Some simple jQuery settings will allow you to control the default view and the type and position of controls and information:

header: {
        left: 'prev,next today',
	center: 'title',
	right: 'month,agendaWeek,agendaDay'
	},
editable: false,
firstDay: '1',
defaultView: 'agendaWeek',

This places the “<”, “>” and “today” buttons on the left, the month week and day button on the right and the title in the middle. It also makes it read only, sets the first day to monday and by default shows a week view (see fig 1)

fullcalendar
fig 1: Fullcalendar configured as above.

Provide the plugin with correctly formatted JSON, as below and you will get google calendar like output as seen in fig 2.

[
   {
      "title":"ABC001 Advanced Converter Cars with Prof Pat Pending in Harry Palmer 101",
      "end":"2015-03-20 13:00:00",
      "start":"2015-03-20 11:00:00”
   }
]

event

fig 2: Event output from JSON in Fullcalendar

Keen eyed readers will notice that the JSON required by Fullcalendar, doesn’t match the JSON we have in our module files. This would require some, on the fly, programatic manipulation via Perl (the one true language), to stitch together the lecturer, room, id and module name to form the “title” element. The datetime elements are formatted using perl’s localtime, split and join functions.

Putting it all together gives us a students personalised timetable in a portable format, that could be used for any of our applications (VLE, portal, student support systems etc) without any bespoke tie in to other systems.

A special thanks go to our IT support assistants (both past and present), for being willing guinea pigs and especially to Tom for remaining positive and optimistic, whilst we were destroying his google calendar setup on an almost daily basis.

Dashboards

lorls-dashboardIn 2012 we produced a dashboard for our reading list system that allowed academics to see how (and if) their reading list were being used. This proved to be a great success both with the academics and also in terms of the approach taken to develop it. So a year later when there was a need to produce some statistics from the newly installed access control system in the Library we choose to repeat ourselves. acs-dashboard

We were supplied with the log files from the access control system and enriched this with additional information from the Library Management System. A database was created and associated script developed to regularly import the information into the database. A small series of CGI scripts were also produced to extract information from the database and return the results in JSON/JSONP format, which was in turn is processed the front end HTML5 web page using appropriate plug-ins. The dashboard has been used by the Library to promote the success of its recent extension and refurbishment and is being used to monitor occupancy levels (particularly during peak periods such as exams).

And then we did it again!

labs-dashboardBy taking a snapshot of the lab availability system every fifteen minutes and putting the resultant information into a simple database, we used the same tools to develop another dashboard recently to show the use of IT Services managed Lab PCs across the campus. IT staff can now see the occupancy of these labs over time, when they were booked and even see a map of PC hot spots.

So basically our philosophy seems to be: You’re having a dashboard! You’re having a dashboard! The whole INSTITUTION’s having a dashboard! They’re having a dashboard!

Digital Signage

Seminar Room 1As part of the recent refurbishment of the University Library, six digital signs were developed and installed to display room/resource bookings and the availability of computers in computer labs.  The underlying technologies behind these signs are Raspberry Pi’s running Screenly OSE, an open source distribution.  The content displayed is a HTML5 webpage that regularly updates its content with data retrieved from the Library’s room booking API’s and University labs usage API’s.

Cost comparison of Raspberry Pi and other units for driving digital signs

Cost comparison of Raspberry Pi and other units for driving digital signs

Raspberry Pi’s provide a significant cost saving when compared to traditional digital signage systems.  The initial cost of the hardware (approximately £50) is an obvious benefit, with other digital signage computers costing between £200 and £500.  There is a supplementary ongoing saving due to the power requirements of a Raspberry Pi being far lower than traditional units.  In fact a potential saving of over £70 per unit per year is shown when compared to the iBase SI-08 digital signage system.  Taking this a step further we can compare the units based upon their ongoing impact on the environment, where the Raspberry Pi’s low energy requirements again shows it produces far less KgCO2e (kg carbon dioxide equivalent) than the other units it was compared against.

Approx Cost Watts Kwh/year Ongoing Cost per unit per year (15p/Kwh) KgCO2e per year
Raspberry Pi £50 5 43.8 £6.57 19.51
Acer Aspire Revo £300 29 254.04 £38.11 113.17
SI-08 (Digital Signage System) £400 60 525.6 £78.84 234.14

Scaling the solution

While the initial digital signs were a big success there were still a number of issues that needed to be tackled before the technology could be adopted on a larger scale.  The key areas that needed to be tackled were:

  • Ease of installation and configuration of new units – Ideally the units should simply need plugging in to their screens, network and power and then be available.
  • Devolved administration – It would be essential that individuals can administer their own units and not those owned by others.
  • Ease of administration of each unit – Due to a lack of security options on the open source version of Screenly, the original units were secured by requiring an administrator to use ssh to connect to the unit before they could access the administration web interface.

To meet these requirements a new SD card image was created (based on the open source Screenly image).  This image was enhanced to communicate with a new back-end system, that was developed, to manage the administration of the units.  To avoid having to manually configure the network settings on each unit they use DHCP to pick up local network settings and a valid IP Address.  Every time a unit is issued with a new IP Address, or renews one it has already been issued with, it checks in with the back-end system.

pcAvailability

Each time it checks it in it passes the back-end system its current IP address and its MAC address.  If the system hasn’t seen the MAC address before it creates a new unit record and populates it with the units details, if a record already exists for the unit then it’s updated with the new information.

When system administrators access the back-end system they are presented with list of any units that haven’t been assigned to a group.  They can then assign the unit a name and a group.  They can also add users to groups and create new groups if required.

Users can be added as either a standard user or a group administrator.  Currently the only difference between the two is that a group administrator can add new users to their group and remove existing users from their groups.  Each user in a group can access the administration interface of any unit assigned to that group.

In the unit’s administration interface users can add and order a number of assets to be displayed.  Each asset has a name, a type (either image, webpage or video), times and dates that it is valid between and a duration of how long the asset should be displayed for.  The ordering that active assets are displayed in can be adjusted by dragging an asset about the list.

One final thing to consider when scaling the solution is that the savings discussed earlier also scale.  For example the six Raspberry Pi’s running in the Library have, all together, over the last five months used £16 worth of energy or to put it another way they have used £78 less energy than six Acer Aspire Revo’s would have and £180 less than six iBase SI-08 units.

We are currently testing the system with the intention of digital signs being deployed outside of University computer labs ready of the start of the new academic year.

Apps for Education

Google Apps for Education allows HE and FE institutions to make use of Google Apps services for their staff and students. This includes services such as email, calendaring and file store.

At Loughborough University we’ve been investigating how to integrate some of the Google Apps for Education services into our local IT ecosystem to improve the student experience. What follows is a brief overview of two of these developments – one for Google Drive and another extending the use of Google Calendar.

Google Drive

We were asked to look at using Google Drive to provide additional file store for students. This would not only cut down on our file storage costs for students, but they would also be able to more easily access their files when off campus or using mobile devices. Additionally they could potentially share their files with 3rd parties (eg work placement employers, friends at other institutions, etc) which would be difficult to do with locally held file stores.

Google provide a web browser based user interface to this service and also a Windows file system plugin. We had hoped that the Windows plugin could be used on our lab PCs around campus to give students a seamless view of the Google Drive file store, but unfortunately it appeared to only use the Google credentials for the user that originally installed it. This was fine on staff machines with only one user, but our lab machines may have many people using them and are locked down to prevent students installing new software.

We therefore looked to the Google Drive Application Programming Interface (API) to see if there was a way we could work round this.  As well as providing multiple users on the same machine with access to the file store, we really wanted to avoid having to ask the user for their password again to access the Google Drive. Instead we intended to make use of the Active Directory log ins that they had already made on the lab PCs.

We decided the easiest way to support this was to use a Linux machine running Samba as a “bridge” between the Windows file sharing world and the Google Drive APIs.  Samba is a free, open source implementation of the Common Internet File System (CIFS), which is the underlying protocol that supports Windows file sharing.  As well as being open source, Samba also has the concept of Virtual File System (VFS) modules, which are extensions that allow developers to intercept CIFS operations to provide additional services.

After testing basic functionality via the pre built PHP client libraries, we started prototyping in Perl in order to work out how to generate the OAuth2 delegated authority JSON Web Token (JWT). OAuth2 allows a Google Apps service account to authenticate to Google’s servers and then act on behalf of one of our Google Apps users.  These prototype Perl scripts were designed to be “thrown away” – they merely allowed us to experiment rapidly in a comfortable development environment.

Once we had the Perl prototype authenticating successfully and issuing the RESTful Google Drive API calls, we then turned our attention to implementing a Samba VFS module in C. Whilst using the Samba version 4 distribution we were actually working with the Samba 3 part of the code, as much of the new Samba 4 code is concerned with Active Directory (AD) domain controller provision, and all we needed our bridge server to be was a fileserver bound into the campus AD.

This new VFS module would first take the username of the user making the CIFS connection and then look up their email address in the campus AD using LDAP.  Once the email address is returned it can then be used to create a JWT for passing to Google’s OAuth2 service to authenticate the delegated service account access to the user’s Google Drive account. A successful login returns a limited lifespan bearer access token which is then used with subsequent RESTful Google API calls to authenticate each transaction.

The rest of the Samba VFS module coding was concerned with mapping the CIFS operations to Google Drive operations.  This is quite difficult – CIFS and Google Drive not only have some underlying differences in the way that the file system structure is designed, that requires work arounds in the VFS module, but we also had to learn about the CIFS protocol itself.  CIFS has been developed over many years with additions made as new operating systems and services have been deployed, often with little or no documentation.

Eventually we managed to provide enough mappings for the following simple operations to work:

  1. file or directory creation,
  2. file or directory deletion,
  3. file contents editing,
  4. directory listings
  5. stat() type system calls to return file information (some of which we had to fake!),
  6. file contents caching (to improve performance),
  7. simple whole file locking on the windows side (there is no real concept of locking on the Google Drive side as it is naturally a shared space).

The result of our development is that we can now mount one (or even more) Google Drive file stores on a windows PC simply by doing a CIFS file sharing mount onto our “bridge” Linux server.  Performance is not stellar but is acceptable for many users – caching the files returned from Google on the file store of the bridge server helps a lot, especially as many people read lots of files but only edit a few in any one session.

Screen Shot of Google Drive

Google Drive shown being accessed via both Google’s own web interface and our Samba CIFS bridge

 

At the moment it is being tested by a number of staff on campus and will be rolled out to shared lab PCs in the Library shortly. One additional advantage of the Samba bridge is that it allows non-Windows machines to also mount Google Drive as a file store.  For example Linux machines can now mount Google Drive storage using this system, which is a platform that Google themselves do not yet support directly.

Calendars

The student Google Apps users get access to Google’s excellent Calendar service. However there are other, pre-existing University calendars with information of interest to students held elsewhere. One example of this is the lecture, lab and tutorial timetabling system. We were asked if we could attempt to bring the timetable data into the Google Calendar system so that students would have their timetables available alongside their personal calendar entries.

There were a number of issues to solve here. We could have created an extra calendar for each student and injected their personal timetable into it, but this would have required many thousands of calendars to be created.  As lectures details often change during the term, we would also have many more events to be maintained. It would not simply be the case of creating the calendar events at the start of the year, term or semester as the calendars will have to be updated daily.

Instead we decided to create a new calendar for every module being taught. This was still around 1500 calendars for one semester but we then further sub-divided them into the “part” of the course (first years are Part A, second years are Part B) with six new users being created to handle the calendars for one part each.

Screen Shot 2014-02-03 at 15.28.48

A view of the first year Mathematics undergraduate module timetables for one week in a Google Calendar.

Google Calendar API then allows us to share an individual module calendar from the “part user” to the student users that are taking that module.

Our system needs at least three different Perl scripts to run:

  1. Create the initially empty module calendars in each of the “part user” accounts (this is the most heavily time dependent part, due to API limitations on the speed and number of calendars you can create at any one time),
  2. Extract data from our existing timetable system’s Oracle database and inject these as events in the appropriate module calendar. This script also needs to handle events that were once in the timetabling system but which have been deleted or changed,
  3. Attach the correct module calendars to each student user. This is done by interrogating both the module group membership from our Active Directory and current state of module calendar sharing, and then issuing the required remove/add ACL rules using the Google Calendar API.

Alongside the ability to share a full calendar of module lectures with our students, we are also able to inject events into their primary calendar directly, using the same core Perl modules to manage the OAuth2 delegation. We have begun to use this by creating “all day” events, listing a students library books due for return.

Screen Shot 2014-02-03 at 15.30.24

A Google Calendar showing a library book return reminder entry.

Thanks to APIs we have created to our library systems, we are able to deduce which students are required to return their books in seven days and can create events, with controlled notifications, in the students calendar prior to the books return date.

We are currently testing these systems, with a subset of the students, to gain feedback and aim to go live for all undergraduate and taught course students in Autumn Term 2014.

New frontend for Booking System

One of the summer tasks we have been dealing with has been to bring the frontend of our Web User Booking System (WUBS) up to date.  The original user interface dates back to when it was first developed (2005) and, while still more than adequate for the task, was starting to show it’s age.

Sample booking screen

After a short discussion in the team it was decided to follow a similar method to that which had been used for LORLS, specifically separating the front and back ends via a set of APIs.  The majority of the development effort was spent on the frontend as the core work on the APIs had already taken place to enable our mobile webApp to review, make and cancel bookings.

The key technologies behind the new frontend are HTML5 and JQuery.  Additional JQuery plugins used are DropKick, to provide the nice looking drop down lists, and Datepicker, for selecting dates from a calendar.

Library goes mobile

We have recently developed a mobile webApp to support access to a range of our Library services from student and staff owned mobile devices.

My Library

Library services provided via the webApp currently include opening hours, your borrowing details, reading lists, room booking information, contact details, events and library news.  It is intended that further development will take place over the coming months to incorporate new features into the webApp based upon feedback from our users.

Version 2 of our mobile webApp was released on 7th March 2013 and included the following new features:

  • Renew of loan items
  • Cancel item requests
  • Current Library PC availability (based upon the lab availability system)

Version 3 of the mobile webApp was released on the 23rd April 2013 and included the following additional features:

  • Make a room/resource booking
  • Cancel existing room/resource bookings
  • New “Library on Tour” section (with information about accessing library resources over the summer)

Lab Availability

As part of the development of a student web information portal at Loughborough University a traffic light style widget, showing the usage of our computer labs, was posited. This would offer a number of advantages with students being able to chose the lab they would head towards by seeing those which were currently, or soon to be, booked and of the available ones which were busy.

Mash

The Active Directory (AD) was the first port of call for this. Each of our labs has their own organisational unit (OU), which contains all of the machines currently in the lab. The LDAP libraries of php could then be used to scoop up the AD computer object for each machine. An AD extension attribute on each managed computer was assigned and would contain binary data. 0 would indicate that the machine was not logged into and 1 would mean the machine was busy. Attribute control is handled by LDAP Modify commands that are run as part of the log on and off processes of the PC’s. By counting the number of zeros we could tell how many of a labs machines were currently free.

More mash

first_go

First attempt at the traffic light system

For those labs that are open access, the AD attribute control method offered all of the monitoring that was required. More work was required, however, on those labs that could be booked.

Therefore the second stage, was to question the University’s central timetabling system to see the status of the labs themselves. For this a database view was created, showing the status of the bookable labs throughout the day. SQL queries could now be sent to the view and the current and future status of the lab interpreted.

Mash with gravy

The Library uses a separate booking system (WUBS) for its seminar rooms and CAD lab. As these are not controlled from the central timetabling system, more mashup was required to connect to its API and once again poll for the daily booking data. Once this was achieved, it was simply a case of converting the WUBS API data into the same format as the central timetable data and output it all. This data could then be taken and combined with HTML5, to produce a unified traffic light display of availability throughout the campus (See our Digital Signage blog for the details).

pcAvailability

Pi and Mash. Raspberry Pi powered HTML5 mashup

What’s About

On the 6th July 2011 UK Discovery and DevCSI started running a competition to encourage the creation of tools/applications that make use of one or more of their open data sources.

Did it work? Well having looked at their open data sources I decided to create an application that would help visualise the English Heritage Places dataset and link it with the British National Bibliography dataset.  What’s About is the resulting webapp.

You can have a play with What’s About by pointing your browser at http://whatsabout.lboro.ac.uk/whatsAbout/

If your browser supports the W3C Geolocation API then you will initially be prompted asking if you want What’s About to know your location.  If you say yes it will use your location as the basis of the map, if you say no or your browser doesn’t support the W3C Geolocation API then the map will display around the Loughborough, Leicestershire, Nottinghamshire area, though you can move around the map and zoom in and out to look at areas that you are interested in.

The core of What’s About is the map, which takes up most of the window and the options column on the right hand side.  The options column contains a list of place types which can be used to limit down the places shown to specific types.  If there are over 100 places within the current map view then a grid is shown on the map.  A count of the number of places located within each cell of the grid is shown in the center of each cell

If there are less than 100 places located within the current map view then each location will be shown on the map with an icon.  The icon shows the type of place and if you hover over an icon the name of the place will appear.

The names of the places shown on the map will also appear in the options tab on the right.  If you wish to locate a specific entry from the list clicking on its name will cause its icon to bounce up and down on the map for a few seconds.

 

Clicking on an icon will bring up a window with some resources for that place, including a link to the original record in the English Heritage Places dataset, a list of books with the place name contained in the title and a list of links to relevant online resources.

The books are taken from the British National Bibliography dataset and Related Resources links can be added by anyone using the “Add a wikipedia page” option at the bottom left of the window.  Currently only a Wikipedia URL can be entered for a Related Resource to reduce the risk of it being used to add spam, though in the long term this could be opened out to any URL provided that there were enough moderators available to remove any spam links.

There is also an extractLinks API that will return a JSON object containing the English Heritage URIs for places with attached online resources, along with the URLs and Titles for the links.

Because it is built on Google Maps technology you can also use the street view (provided there is a street view available for the area) and the icons currently shown on the map will appear in the street view helping you identify the actual places you are looking for.

The code behind What’s About is available as open source via a GPL licence.  The current version of the code can be downloaded from http://whatsabout.lboro.ac.uk/whatsAbout/WhatsAbout-1.tar.gz

Update: Well I have just found a few more bugs and also improved the street view functionality so that it now updates the places on the map as you move about.  The new version of the code is available from http://lb-public.lboro.ac.uk/whatsAbout/WhatsAbout-1.1.tar.gz