Monthly Archives: April 2010

Bean counting

One of the many things I love about Koha is the easy access I have to statistics. I’m a do-it-yourselfer, so I seldom use Koha’s built-in reports. Instead I go directly to the source: Koha’s statistics table.

Field Type Null Key Default Extra
datetime datetime YES MUL NULL
branch varchar(10) YES NULL
proccode varchar(4) YES NULL
value double(16,4) YES NULL
type varchar(16) YES NULL
other mediumtext YES NULL
usercode varchar(10) YES NULL
itemnumber int(11) YES NULL
itemtype varchar(10) YES NULL
borrowernumber int(11) YES NULL
associatedborrower int(11) YES NULL

The statistics table is long-term memory storage for circulation transactions: checkouts, check-ins, renewals, and payments. The kind of reports we depend on most are circulation reports, so I usually focus on checkouts and renewals.

Just a few of these fields are relevant to most of my circulation reports:

  • datetime stores the date and time of the transaction.
  • branch stores the three-letter code for the branch where the transaction took place.
  • type records whether the statistic is a checkout (“issue”), check-in (“return”), renewal (“renew”), payment (“payment”), or write-off (“writeoff”).
  • itemnumber, which records the id of the item in Koha’s catalog.

Using these pieces we can put together a query which counts circulations per branch for a given month:

SELECT branch,count(*) FROM statistics WHERE year(datetime) = 2009 AND month(datetime) = 12 AND (type = 'issue' OR type = 'renew') GROUP BY branch ORDER BY branch;
branch count(*)
ALB 2762
APL 21475
COV 1746
CPL 616
GPL 2725
NPL 5475
PPL 5391

7 rows in set (37.16 sec)

One important thing to note about the results of that query: It took over thirty-seven seconds to execute. That’s ages in MySQL terms, and a cause for caution. We’ve been using Koha since 2003, so our statistics table is huge: almost nine million rows. For that reason I don’t run my statistics queries directly on our production Koha server. I back up the statistics table to a separate database where it won’t interfere with the performance of our Koha operations.

Getting fancy

Querying MySQL directly can get you many of the numbers you want, but if you want to aggregate data and manipulate it in fun ways you have to add a scripting layer to the process. For Koha that’s Perl, for me it’s PHP. The end result that I want to share is a report of circulations per month for all branches. This report shows not just the raw numbers from Koha but also formats them as a graph using Google’s Chart API.

This was just a quick look at the kind of work I’ve been doing recently. There are a lot of pieces of the puzzle that I’ve glossed over. I hope at the very least it’s an interesting glimpse of what is possible when you have easy access to your data and the tools to manipulate it.