Tag Archives: statistics

Bean counting

One of the many things I love about Koha is the easy access I have to statistics. I’m a do-it-yourselfer, so I seldom use Koha’s built-in reports. Instead I go directly to the source: Koha’s statistics table.

Field Type Null Key Default Extra
datetime datetime YES MUL NULL
branch varchar(10) YES NULL
proccode varchar(4) YES NULL
value double(16,4) YES NULL
type varchar(16) YES NULL
other mediumtext YES NULL
usercode varchar(10) YES NULL
itemnumber int(11) YES NULL
itemtype varchar(10) YES NULL
borrowernumber int(11) YES NULL
associatedborrower int(11) YES NULL

The statistics table is long-term memory storage for circulation transactions: checkouts, check-ins, renewals, and payments. The kind of reports we depend on most are circulation reports, so I usually focus on checkouts and renewals.

Just a few of these fields are relevant to most of my circulation reports:

  • datetime stores the date and time of the transaction.
  • branch stores the three-letter code for the branch where the transaction took place.
  • type records whether the statistic is a checkout (“issue”), check-in (“return”), renewal (“renew”), payment (“payment”), or write-off (“writeoff”).
  • itemnumber, which records the id of the item in Koha’s catalog.

Using these pieces we can put together a query which counts circulations per branch for a given month:

SELECT branch,count(*) FROM statistics WHERE year(datetime) = 2009 AND month(datetime) = 12 AND (type = 'issue' OR type = 'renew') GROUP BY branch ORDER BY branch;
branch count(*)
ALB 2762
APL 21475
COV 1746
CPL 616
GPL 2725
NPL 5475
PPL 5391

7 rows in set (37.16 sec)

One important thing to note about the results of that query: It took over thirty-seven seconds to execute. That’s ages in MySQL terms, and a cause for caution. We’ve been using Koha since 2003, so our statistics table is huge: almost nine million rows. For that reason I don’t run my statistics queries directly on our production Koha server. I back up the statistics table to a separate database where it won’t interfere with the performance of our Koha operations.

Getting fancy

Querying MySQL directly can get you many of the numbers you want, but if you want to aggregate data and manipulate it in fun ways you have to add a scripting layer to the process. For Koha that’s Perl, for me it’s PHP. The end result that I want to share is a report of circulations per month for all branches. This report shows not just the raw numbers from Koha but also formats them as a graph using Google’s Chart API.

This was just a quick look at the kind of work I’ve been doing recently. There are a lot of pieces of the puzzle that I’ve glossed over. I hope at the very least it’s an interesting glimpse of what is possible when you have easy access to your data and the tools to manipulate it.

New SQL Repository on the Koha Wiki

Koha users have begun building a library of useful SQL statements for use in building Koha reports. It’s on the Koha Wiki. You can add your own or put in a request for a report you’d like to know how to do. If you’d like to contribute you can either register on the site or use OpenID to log in. If you’ve never edited a wiki before, be sure to check out the the Wiki’s help page on editing before you jump in.

Getting started with statistics

Koha 3 has always had a selection of built-in reports, and Koha 3 adds additional reports and new a “Guided Reports” system (partially sponsored by this library). The Guided Reports system is still a little rough around the edges, but folks are doing some interesting stuff with them, in particular by using the system’s ability to run custom SQL queries via the Koha interface.

I’d like to take some time to explore how we’re starting to leverage the data that Koha collects to build some reports about how the library is being used. I’ll start with circulation statistics.

The Statistics Table

The health of the library is (for better or worse) judged by its circulation statistics, so that’s first priority. In Koha, every checkout, check-in, and renewal is recorded in one table in the database called statistics. Here’s what it looks like:

Field Type Collation Null Key Default
datetime datetime Yes MUL NULL
branch varchar(10) utf8_general_ci Yes NULL
proccode varchar(4) utf8_general_ci Yes NULL
value double(16,4) Yes NULL
type varchar(16) utf8_general_ci Yes NULL
other mediumtext utf8_general_ci Yes NULL
usercode varchar(10) utf8_general_ci Yes NULL
itemnumber int(11) Yes NULL
itemtype varchar(10) utf8_general_ci Yes NULL
borrowernumber int(11) Yes NULL
associatedborrower int(11) Yes NULL

Some of those columns aren’t even used–I’m not sure if they were in the past, or whether there were plans for them for the future. other, usercode, and associatedborrower don’t seem to be in use.

  • datetime records the time and date of the transaction.
  • branch is the location of the transaction.
  • proccode is related to tracking patron fines, payments etc.
  • value records a currency amount (for fines, payments, etc).
  • type records the type of transaction:  issue [checkout], return [check-in], renew, payment, or writeoff.
  • itemnumber is the id number (defined by items table) of the item that was handled in the transaction.
  • itemtype is a category assigned to the item as defined in Koha’s Item Type management.
  • borrowernumber is the id number (defined by the members table) of the patron involved in the transaction.

When you check something out in Koha, a line is added to this statistics table (SELECTing datetime,branch,type,itemnumber,itemtype, and borrowernumber):

datetime branch type itemnumber itemtype borrowernumber
2008-10-07 14:07:31 CPL issue 40235 CIRC 20351

To get a quick look at how much you’ve circulated today, you could run this:

select count(*) from statistics where year(datetime) = year(curdate()) AND month(datetime) = month(curdate()) AND DAY(datetime) = day(curdate()) and (type='issue' OR type='renew');

The SQL matches the year, month, and day of transactions against today’s date and limits the results to checkouts and renewals. Here’s our count for March 16, 2009:

count(*)
2190

1 row in set (36.76 sec)

Notice how long the query took. The statistics table gets really big: ours has data going back to May 2003, and it has 7,581,248 rows in it. That makes for some slow queries.

Getting a quick count is a great way to put a number to your day’s work. At the end of the month, though, you’ll want to get some good numbers to show your Board about what kind of business you did. Let’s break it down by collection code so we can see what kind of materials our patrons checked out. We’ll look at the Athens branch’s circulation last month:

SELECT items.ccode, COUNT( statistics.itemnumber ) AS count
FROM items, statistics
WHERE statistics.branch = 'APL'
AND statistics.itemnumber = items.itemnumber
AND year( statistics.datetime ) =2009
AND month( statistics.datetime ) =02
AND (
statistics.type = 'issue'
OR statistics.type = 'renew'
)
GROUP BY items.ccode

We get a nice breakdown of how each category (or at least a selection, in this example) fared during the month:

ccode count
AB 427
AF 1355
AV 1002
CDM 681
DVD 1006
EASY 1754
JF 519
JNF 1028
LP 667
MYS 770
NF 1994

You should be able to try out these examples yourself using the Guided Reports system in Koha (or right in MySQL if you have direct access to your database).