Is it ethical to add Google Analytics to a library catalog?

One talk I attended at this year’s Innovative Users Group conference in San Francisco (about which I started writing a general post, but found it difficult to strike, er, the right tone) focused on using Google Analytics in the library catalog.

There was really a lot to like in this presentation; the presenter showed how it helped his institution discover how students were searching, compare discovery of e-books via the catalog vs. via the e-book databases, and more. He even offered to share his Analytics data with librarians who also use the tool, and provided links to custom reports and custom segments that help shape the data Analytics provides. This is one of the things Google does so well: allowing users of its products to collaborate.

One of the great advantages of Analytics was its ability to track users’ search terms: since the URLs an Innovative catalog produces include the search terms, you can extract the queries themselves.

The problem

The issue of privacy came up briefly during the talk, when the speaker explained how he used a filter to strip out patron numbers on pages associated with a user account. A much broader privacy issue was not discussed: the fact that a library that adds Google Analytics is sharing its users’ search queries with Google.

Using Google Analytics is generally thought of as a no-brainer. As techy librarians Darlene Fichter and Jeff Wisniewski put it in a recent introduction to Analytics piece in Online Magazine (not freely available, sorry):

Why Google Analytics? Because it’s one of the most widely used site-usage packages out there. It’s robust, easy to use, and web-based. It can track both desktop and mobile usage. Plus, you can’t beat the price—it’s free.

I’ve been using Google Analytics for various projects, including the library search widget I created for our learning management system. I love that it allows me to track searches originating from the search boxes I put on there. (Don’t examine the screenshot too closely; I screwed something up there, turns out Analytics is case-sensitive.)

Content drilldown page showing number of catalog and database searches

Here we’re looking at use of the resource—how many people used various search boxes. Tracking the queries themselves, as you would do in the library catalog, seems to expose our users in a much more meaningful way. Librarians tend to guard patron privacy closely, going on about the Patriot Act and engaging in unpleasant battles over attempts by the administration, the general public etc. to impede unfettered/unsurveilled Internet access. So why the blind spot here?

Some possibilities:

We don’t understand that we are compromising patron privacy.

This could be true in some cases; after all, the Google Analytics reports we can access don’t display visitors’ IP addresses, so maybe some people might think Google doesn’t retain them either. I doubt most librarians would make this assumption, though; if you’re savvy enough to work with Analytics, you’re probably savvy enough to think it through this far.

We figure that disclosing our practices is sufficient.

Some libraries post a privacy policy and include info about their use of Analytics there (Virginia Tech, where the librarian presenting at IUG works, does this). Our library doesn’t, and it should. (Note: it appears that users of Analytics are actually required to post a privacy policy! Didn’t know that… See GAnalytics Terms of Service, no. 7.) 

But still, I don’t find this solution totally satisfying. First, I don’t expect students to read any of our policies, at least not until they get into trouble, or go to library school. Yes, those who are very interested in online privacy might do so, and in fact Google has browser extensions for opting out of all Google Analytics tracking for such users. Do-not-track features in recent browser builds are another option for individual users. But this is opt-out; really? If we value patron privacy, shouldn’t we make pro-privacy policies the default, and if anything allow users to opt in to being tracked?

We think it’s not that big a deal

I’m on the fence myself on this one; is it really such a big deal when we use Google trackers in a catalog? Analytics is all over the Web, after all—our visitors are tracked most places they go. So putting it in the catalog can’t be that terrible. All we are doing is sharing our patron’s search queries with a publicly traded corporation that engages in the most thorough data collection project in human history…

But even still, does Google care in any meaningful sense? Will it do anything with this information—is privacy meaningfully compromised? I’m having trouble getting anything helpful out of Google’s privacy policies. Read the "Information Sharing" section of Google’s Privacy Policy. It essentially says, "we’ll do the best we can, but there are certain conditions under which we’ll disclose information." What more could they say? But the fact remains: we are sending off our patrons’ search data to a third party over which we have no control and exert little of any influence.

We figure it’s worth the trade-off

This is related to the previous point, and may be where a lot of librarians come down on the issue. Yes, we could say, we are compromising patron privacy in a way we usually find distasteful. But we get so much cool stuff in exchange! You get data, quick built-in graphing tools, exportable tables, all sorts of presentation-ready graphs. You can tinker with the filters and custom reports and share your work. In a sense, you get a share of Google’s massive power. Yeah, I’m just speculating here, but if there’s anything to it, it doesn’t reflect well on us.

So are there ways to get good data on what our users are doing with our catalogs that doesn’t involve sharing their queries with Google?

Alternatives

Your ILS

There are other ways of getting some data from our catalogs. They are not very powerful. I somehow only learned recently of Innovative’s reports for the OPAC in Millennium; not much there. You can run some basic reports—not real fine-grained, and you have to do it and export it regularly. It won’t be available online, unless you do some work to put it there. Ugly, not versatile, labor-intensive. I really have no idea what built-in analytics the other major ILS’s include.

Piwik

Piwik logo Piwik is an open-source analytics package that offers a lot of the same benefits as Google’s—quick graphs, sorting, filters, reports etc.—but you run on your own and have complete control over, no third parties involved. I’ve tried it out on a low-traffic site I manage and it seems to perform ok.

Is it as cool as GAnalytics? Hard to say, but I’m going to assume it’s not, because it doesn’t have the sheer scale of Google’s organization behind it. So here’s the question: do we need the best product? Are we willing to trade user privacy in order to get it, or can we survive with just a basically ingenious but not-as-amazing thing that allows us to preserve our principles?

But, there’s a catch: that locally hosted thing. You’ve got to set up your MySQL database (not a complicated thing to do, but something that may have to be cleared organizationally, then backed up and maintained); upgrade the software when necessary, because there might be some vulnerabilities in need of patching; if someone hacks into it, it’s on you to fix everything; apparently you need to set up some sort of stats archiving schedule so that the Piwik interface doesn’t get too bogged down. All of this is of course the whole reason we love to outsource these things to Google.

Telling Google not to do what it wants to do

This may be a solution that makes moot all the handwringing above. When I was looking into this stuff I was surprised to find that Privacy Choice, an organization I had been told was up on these things, has opted to use Google Analytics. I learned that there are some settings you can use to protect your visitors. Most impressively, there is a line you can add to your tracking code (namely, _gaq.push (['_gat._anonymizeIp']);) that anonymizes IP addresses, which Google calls IP masking.

The Google Analytics for Wordpress plugin actually builds this in as a selectable option:

 Anonymize IP's: This adds _anonymizeIP, telling Google Analytics to anonymize the information sent by the tracker objects by removing the last octet of the IP address prior to its storage.

Google could of course make this a selectable option when you generate the code for GAnalytics. Why don’t they? Hm.

It also appears that you can opt out of third-party sharing:

You can decide if you want to share your Google Analytics data with Google, and also have full control over how you share it with us. Visit the Edit Account and Data Sharing Settings page from within your account to opt in to sharing your data "With Google products only" or "Anonymously with Google and others."

But I can’t actually find where to do this; not the first time I’ve had this experience with Google help pages—most likely the interface has changed and the help page not been updated. But even still, this is not a total solution, since third parties are only one piece of the problem.

[go to top]