Posts Tagged ‘Reports’

Bean counting

Friday, April 30th, 2010

One of the many things I love about Koha is the easy access I have to statistics. I’m a do-it-yourselfer, so I seldom use Koha’s built-in reports. Instead I go directly to the source: Koha’s statistics table.

Field Type Null Key Default Extra
datetime datetime YES MUL NULL
branch varchar(10) YES NULL
proccode varchar(4) YES NULL
value double(16,4) YES NULL
type varchar(16) YES NULL
other mediumtext YES NULL
usercode varchar(10) YES NULL
itemnumber int(11) YES NULL
itemtype varchar(10) YES NULL
borrowernumber int(11) YES NULL
associatedborrower int(11) YES NULL

The statistics table is long-term memory storage for circulation transactions: checkouts, check-ins, renewals, and payments. The kind of reports we depend on most are circulation reports, so I usually focus on checkouts and renewals.

Just a few of these fields are relevant to most of my circulation reports:

  • datetime stores the date and time of the transaction.
  • branch stores the three-letter code for the branch where the transaction took place.
  • type records whether the statistic is a checkout (“issue”), check-in (“return”), renewal (“renew”), payment (“payment”), or write-off (“writeoff”).
  • itemnumber, which records the id of the item in Koha’s catalog.

Using these pieces we can put together a query which counts circulations per branch for a given month:

SELECT branch,count(*) FROM statistics WHERE year(datetime) = 2009 AND month(datetime) = 12 AND (type = 'issue' OR type = 'renew') GROUP BY branch ORDER BY branch;
branch count(*)
ALB 2762
APL 21475
COV 1746
CPL 616
GPL 2725
NPL 5475
PPL 5391

7 rows in set (37.16 sec)

One important thing to note about the results of that query: It took over thirty-seven seconds to execute. That’s ages in MySQL terms, and a cause for caution. We’ve been using Koha since 2003, so our statistics table is huge: almost nine million rows. For that reason I don’t run my statistics queries directly on our production Koha server. I back up the statistics table to a separate database where it won’t interfere with the performance of our Koha operations.

Getting fancy

Querying MySQL directly can get you many of the numbers you want, but if you want to aggregate data and manipulate it in fun ways you have to add a scripting layer to the process. For Koha that’s Perl, for me it’s PHP. The end result that I want to share is a report of circulations per month for all branches. This report shows not just the raw numbers from Koha but also formats them as a graph using Google’s Chart API.

This was just a quick look at the kind of work I’ve been doing recently. There are a lot of pieces of the puzzle that I’ve glossed over. I hope at the very least it’s an interesting glimpse of what is possible when you have easy access to your data and the tools to manipulate it.

Custom printed overdue postcards

Monday, February 22nd, 2010

My library has a long and complicated history of how to deal with overdue notifications. We’ve tried a few different forms of printed notices, and at one time we even tried to telephone each patron with overdues. After we switched to Koha we got a new option: sending overdue notices via email. Koha didn’t have a built-in notification system at the time, but the open nature of the application meant we could create our own scripted process to query the Koha database and get the information we needed.

The script, written by our director at the time, Stephen Hedges, queried the Koha database for patrons with overdues and did a few things at once with the results of this query:

  1. Send overdue notices via email.
  2. Create a file of overdues data for patrons without email.
  3. Restrict (“debar”) patrons with items overdue for more than 30 days.

In Koha 3, some of this work can be taken over by new built-in notification functionality. With the Overdue notice/status triggers feature we can define when overdue notices should be sent to patrons and when patrons should be restricted for their overdues (taking care of #3 above).

The drawback to Koha 3′s Notices feature is that it doesn’t give you much help when it comes to printed overdue notices. Koha will send an email notice to any patron who has an email address in their patron record (taking care of  #1 above). For patrons without email, Koha collects all of their overdue notices into a single email which is sent to the Koha administrator. Presumably the idea is that the messages can be printed out and mailed by hand.

Unfortunately this won’t work for us: We don’t want to send letters. It’s too much manual labor to stuff envelopes and stick stamps.  We want to send post cards. They’re cheaper. Of course for privacy reasons (not to mention space constraints) we can’t print out a list of a patron’s overdue items on a postcard. We want to send them a generic reminder which includes our web address (for online renewals) and phone numbers to our branches. So even with improvements in Koha 3 we need to do some work to take care of #2 from the list of tasks performed by the old overdues script.

Let’s take a look at how the old script worked. It began by performing this query:

SELECT issues.borrowernumber, firstname, surname, streetaddress, physstreet, city, zipcode, emailaddress
FROM issues, borrowers
WHERE returndate IS NULL
AND TO_DAYS( NOW( ) ) - TO_DAYS( date_due )
BETWEEN 8 AND 30
AND issues.borrowernumber = borrowers.borrowernumber
AND gonenoaddress < 1
AND borrowers.categorycode != 'HB' ORDER BY issues.borrowernumber

It limited the query to items which were between 8 and 30 days overdue, where the “gonenoaddress” flag was unset (less than one, meaning in this case zero), and where the patron didn’t have the ‘HB’ categorycode (our “homebound” category for home-delivery patrons).

As far as printed notices are concerned, the purpose of the old script was to create a CSV file containing the name and address of each patron with overdues.  Since we’re going to send a non-personalized postcard to each patron, we don’t want the output to include any other personal details. Here’s what the results looked like:

“name” “address1″ “address2″ “city” “zipcode”
“John Smith” “1 Main Street” “” “Nelsonville OH” “45764″

In updating this for Koha 3 we can make that query a little bit more portable by having it check whether the patron in question has a category that requires overdue notices:

SELECT issues.borrowernumber, borrowers.firstname, borrowers.surname, borrowers.address, borrowers.address2, borrowers.city, borrowers.zipcode
FROM issues, borrowers, categories
WHERE issues.returndate IS NULL
AND TO_DAYS( NOW( ) ) - TO_DAYS( issues.date_due )
BETWEEN 8 AND 30
AND issues.borrowernumber = borrowers.borrowernumber
AND borrowers.gonenoaddress < 1
AND borrowers.categorycode = categories.categorycode AND categories.overduenoticerequired = 1
AND borrowers.email IS NOT NULL
AND borrowers.email != ''
ORDER BY borrowers.surname,borrowers.firstname

I’ve added a check that categories.overduenoticerequired is 1, which eliminates the need to hard-code the ‘HB’ patron category. I’ve also added a check to make sure the patron email address field isn’t empty (the old script performed that check elsewhere).

Since I’m more comfortable in PHP than Perl, I use a PHP script to query MySQL and format the results to be saved as a CSV file just like the old script did.

Taking it to the Post Office

We’ve successfully pulled the data we need. What next? We use an online service called Click2Mail, a “trusted and accredited partner to the United States Postal Service.” Click2Mail allows us to define a custom postcard template with our own personalized message. We can then upload our CSV file containing the name and address information for each patron with overdues. The service parses that CSV file, prints one postcard for each patron, and delivers it to the USPS for delivery. Click2Mail even checks your mailing addresses for possible errors and lets you review them.

Their cost estimator puts the cost of sending 100 single-sided postcards at about $30. Considering the time, effort, and cost of stuffing and stamping 100 letters, Click2Mail seems to be a great value.

New SQL Repository on the Koha Wiki

Thursday, April 30th, 2009

Koha users have begun building a library of useful SQL statements for use in building Koha reports. It’s on the Koha Wiki. You can add your own or put in a request for a report you’d like to know how to do. If you’d like to contribute you can either register on the site or use OpenID to log in. If you’ve never edited a wiki before, be sure to check out the the Wiki’s help page on editing before you jump in.

Getting started with statistics

Tuesday, March 17th, 2009

Koha 3 has always had a selection of built-in reports, and Koha 3 adds additional reports and new a “Guided Reports” system (partially sponsored by this library). The Guided Reports system is still a little rough around the edges, but folks are doing some interesting stuff with them, in particular by using the system’s ability to run custom SQL queries via the Koha interface.

I’d like to take some time to explore how we’re starting to leverage the data that Koha collects to build some reports about how the library is being used. I’ll start with circulation statistics.

The Statistics Table

The health of the library is (for better or worse) judged by its circulation statistics, so that’s first priority. In Koha, every checkout, check-in, and renewal is recorded in one table in the database called statistics. Here’s what it looks like:

Field Type Collation Null Key Default
datetime datetime Yes MUL NULL
branch varchar(10) utf8_general_ci Yes NULL
proccode varchar(4) utf8_general_ci Yes NULL
value double(16,4) Yes NULL
type varchar(16) utf8_general_ci Yes NULL
other mediumtext utf8_general_ci Yes NULL
usercode varchar(10) utf8_general_ci Yes NULL
itemnumber int(11) Yes NULL
itemtype varchar(10) utf8_general_ci Yes NULL
borrowernumber int(11) Yes NULL
associatedborrower int(11) Yes NULL

Some of those columns aren’t even used–I’m not sure if they were in the past, or whether there were plans for them for the future. other, usercode, and associatedborrower don’t seem to be in use.

  • datetime records the time and date of the transaction.
  • branch is the location of the transaction.
  • proccode is related to tracking patron fines, payments etc.
  • value records a currency amount (for fines, payments, etc).
  • type records the type of transaction:  issue [checkout], return [check-in], renew, payment, or writeoff.
  • itemnumber is the id number (defined by items table) of the item that was handled in the transaction.
  • itemtype is a category assigned to the item as defined in Koha’s Item Type management.
  • borrowernumber is the id number (defined by the members table) of the patron involved in the transaction.

When you check something out in Koha, a line is added to this statistics table (SELECTing datetime,branch,type,itemnumber,itemtype, and borrowernumber):

datetime branch type itemnumber itemtype borrowernumber
2008-10-07 14:07:31 CPL issue 40235 CIRC 20351

To get a quick look at how much you’ve circulated today, you could run this:

select count(*) from statistics where year(datetime) = year(curdate()) AND month(datetime) = month(curdate()) AND DAY(datetime) = day(curdate()) and (type='issue' OR type='renew');

The SQL matches the year, month, and day of transactions against today’s date and limits the results to checkouts and renewals. Here’s our count for March 16, 2009:

count(*)
2190

1 row in set (36.76 sec)

Notice how long the query took. The statistics table gets really big: ours has data going back to May 2003, and it has 7,581,248 rows in it. That makes for some slow queries.

Getting a quick count is a great way to put a number to your day’s work. At the end of the month, though, you’ll want to get some good numbers to show your Board about what kind of business you did. Let’s break it down by collection code so we can see what kind of materials our patrons checked out. We’ll look at the Athens branch’s circulation last month:

SELECT items.ccode, COUNT( statistics.itemnumber ) AS count
FROM items, statistics
WHERE statistics.branch = 'APL'
AND statistics.itemnumber = items.itemnumber
AND year( statistics.datetime ) =2009
AND month( statistics.datetime ) =02
AND (
statistics.type = 'issue'
OR statistics.type = 'renew'
)
GROUP BY items.ccode

We get a nice breakdown of how each category (or at least a selection, in this example) fared during the month:

ccode count
AB 427
AF 1355
AV 1002
CDM 681
DVD 1006
EASY 1754
JF 519
JNF 1028
LP 667
MYS 770
NF 1994

You should be able to try out these examples yourself using the Guided Reports system in Koha (or right in MySQL if you have direct access to your database).


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported.