ANU DSpace Statistics Installation Guide

Introduction

This document is divided into two sections. The first is the installation process for XML and HTML reports for a standard DSpace installation, the second contains the instructions for taking advantage of Cocoon graphical reports.

Technical Requirements

The following are requirements for installing the ANU Statistics feature:

Installation Instructions - XML and HTML reports

As this DSpace extension is only at an early release the installation process is a manual series of steps and so only recommended for technical users. To install:

  1. Untar the distribution: (tar xvfz dspace-stats.tar.gz) and cd to the distribution directory.
  2. Build the jar file using ant and copy it into your DSpace source lib directory. Ensure you have your CLASSPATH environment variable to include the DSpace jars. This will create a dspace-stats.jar file in the lib directory.

    Note: Included in the source tree are some modified versions of servlets, namely RetrieveServlet, HTMLServlet, BitstreamServlet and HandleServlet. The modifications are to log events for logging item and bitstream views (e.g. changed log.info(LogManager.getHeader(context, "view_bitstream", "bitstream_id=" + bitstream.getID())); to log.info(LogManager.getHeader(context, "view_bitstream", "bitstream_id=" + bitstream.getID() + ":ip_addr=" + request.getRemoteAddr()));. The java classes provided with this distribution are based on DSpace 1.3.1 code so will need to be either removed completely (this will remove the ability to generate reports by IP), replaced by the modules from the version you are running with the changes applied, or the changes need to be applied to your own existing versions of these servlets.

  3. Update dspace-web.xml in your DSpace source distribution with the new servlets and servlet mappings. These are provided in the web-xml-mods.xml file. You can ignore the ReportDispatcher servlet if you are not intending on installing the Cocoon reports. Ensure also to update the entries for the servlets listed above if you wish to use the IP filter capability and wish to use the version of these servlets provided with this distribution.
  4. Create the database tables: The former command will create the tables templog, view_item_log, view_bitstream_log, and ip_filter. The latter command will create a trigger to update the log tables whenever a "view item" or "view bitstream" event occurs.
  5. Copy the JSPs from jsp/mydspace into the local JSP are of your DSpace source: cp -R jsp/mydspace $DSPACE_SOURCE/jsp/local. Note that main.jsp is the MyDSpace page and if you have customised this already you need to apply the changes in this distribution's main.jsp into your version. Note also if you are running a different DSpace version you will need to adjust the JSPs to fit with your version.
  6. Copy the config/querylist.xml file into your DSpace config directory.
  7. Copy the entire xsl directory into your DSpace directory (i.e. at the same level as the config directory).
  8. Add to following to the Messages.properties file:
    jsp.stats = Stats Page
    jsp.mydspace.main.stats.heading1 = Stats & Report Generators (Admin Only)
    jsp.mydspace.main.stats.heading2 = Cocoon Report Generator Application
    jsp.mydspace.main.stats.heading3 = Subscriptions
    jsp.mydspace.main.stats.button.xml = xml stats
    jsp.mydspace.main.stats.button.html = html stats
    jsp.mydspace.main.stats.button.graph = graph stats
    jsp.mydspace.main.stats.button.cocoon-report = Cocoon report
  9. Add the following to the dspace.cfg file, using valus appropriate to your installation:
    ##### ANU Statistics Settings #####
    stylesheet.dir = /dspace/xsl
    report.enabled = true
    report.querylist.file = /dspace/config/querylist.xml
    report.stylesheet.dir = /dspace/xsl
    		
  10. Add the following appender definition to the log4j.properties file substituting appropriate values for $DSPACE_DATABASE_USER and $DSPACE_DATABASE_USER_PASSWORD for your installation:
    # A2 appender JDBCAppender
    log4j.appender.A2=org.apache.log4j.jdbc.JDBCAppender
    
    log4j.appender.A2.URL=jdbc:postgresql://localhost:5432/dspace
    log4j.appender.A2.user=$DSPACE_DATABASE_USER
    log4j.appender.A2.password=$DSPACE_DATABASE_USER_PASSWORD
    log4j.appender.A2.sql=INSERT INTO templog (date, logger, priority, message) VALUES ('%d', '%c', '%-5p', '%m%n')
    log4j.appender.A2.layout=org.apache.log4j.PatternLayout
    Also add the A2 appender by modifying the existing rootCategory entry as follows:
    log4j.rootCategory=INFO, A1, A2
  11. Rebuild and redeploy DSpace. Go to the MyDSpace page to access the statistics.

Installation Instructions - Cocoon reports

This section covers the installation of the graphical Cocoon reports. To install:

  1. Install the XML and HTML reports as detailed in the previous section.
  2. Install Cocoon 2.1.5 or greater
  3. cp -R cocoon/charts $COCOON_DIR/build/webapp
  4. cp -R cocoon/dspace-logs $COCOON_DIR/build/webapp
  5. In the dspace-logs Cocoon webapp:
  6. cd cocoon
  7. ant -Dcocoon-dir=$COCOON_DIR to compile a custom Cocoon generator, substituting $COCOON_DIR with your Cocoon installation directory.
  8. cp -R src/au $COCOON_DIR/build/webapp/WEB-INF/classes
  9. Add the following to the dspace.cfg file, using values appropriate to your installation:
    ##### ANU Cocoon Statistics Settings #####
    report.cocoon.enabled = true
    report.cocoon.chart.url = http://mydspace.myorg:8888/charts/
    report.cocoon.stats.url = http://mydspace.myorg:8888/dspace-logs/main
    dspace.url.withport = http://mydspace.myorg:8080
  10. Copy dspace.jar and postgresql.jar into $COCOON_DIR/build/webapp/WEB-INF/lib
  11. Ensure Postgres is set up to accept queries from the machine running Cocoon. For example if Cocoon was installed on the same server as DSpace, an entry similar to host dspace dspace 127.0.0.1 255.255.255.255 md5 is required in the pg_hba.conf file
  12. Start up Cocoon and restart Tomcat

Quick User Guide

On logging into DSpace the MyDSpace page will now have a number of buttons allowing access to reports. The reports are defined in the querylist.xml -- some samples are included in the distribution -- however any number can be added or deleted dynamically (no restart of Tomcat is required). If you installed the Cocoon extensions and have Cocoon running you will be able to access the graphical reports.

A simple overview of the components involved in the statistics add-on is shown below.

Image showing an overview of statistics components

A simple use case. A user explores a DSpace collection during which time log4j appends all log events to the templog table. For example the user accesses a bitstream which results in the following record being recorded in the database:

date  = 2005-06-11:23:54,461
logger  = org.dspace.app.webui.servlet.RetrieveServlet
priority = INFO
message = anonymous:session_id=600F15A4E8056EBD23C575F056A0474C:view_bitstream:bitstream_id=6110:ip_addr=150.203.59.132

The trigger function analyses the data and records the following data in the view_bitstream_log table:

 bitstream_id =  6110
 item_id = 6109 
 session_id =  600F15A4E8056EBD23C575F056A0474C
 user_id =  anonymous
 date =  2005-06-22
 time =  11:23:54.4610
 remote_ip =  150.203.59.132

This data can be retrieved by queries specified in the querylist.xml file. For example, using the form generator servlet the data can be retrieved in XML, HTML or graph (Cocoon only, JPEG or SVG).

By selecting the report format, a page of queries is shown. The example below shows the representation of the 'items-viewed-in-collection-inTime' query which is defined in the querylist.xml file.

Image showing an example query page

A dissection of the querylist.xml entry for this query is as follows:

<query name="items-viewed-in-collection-inTime" title="Items Viewed in a Period of Time">

Optional transformation specification. Each should define type and stylesheet. The result processor will transform the result set using the nominated by the users.

<option name="use-xsl-transform" type="xml" stylesheet="identity.xsl" render-to="xml document"/>
<option name="use-xsl-transform" type="html" stylesheet="resultset2table.xsl" render-to="table"/>
<option name="use-xsl-transform" type="graph" stylesheet="resultset2dataSet.xsl" render-to="linechart.jpg"/>

Required parameters. The form generator will produce input fields for each element to collect parameter values.

<param src="collectionlist4Eperson" name="collection IDs" id="p3"/>
<param name="date From (DD-MM-YYYY)" id="p1"/>
<param name="date To (DD-MM-YYYY)" id="p2"/>

The SQL query definition element. The report generator will replace the parameters with the input values and execute the query.

<sql>

SELECT collection_id, date, name, sum(volume) FROM (
           SELECT c2i.collection_id, vil.date, cl.name, count(vil.item_id) AS volume 
             FROM view_item_log vil, collection2item c2i, collection cl 
             WHERE vil.item_id = c2i.item_id 
             AND c2i.collection_id IN (p3)
	     AND cl.collection_id = c2i.collection_id
	     AND date < to_date('p2', 'DD-MM-YYYY')
             AND date > to_date('p1', 'DD-MM-YYYY')
	     AND vil.remote_ip not IN (select remote_ip FROM ip_filter)
             GROUP BY 2,3,1
         UNION ALL
             SELECT c2i.collection_id, vbl.date, cl.name,count(vbl.item_id)
             FROM view_bitstream_log vbl, collection2item c2i, collection cl 
             WHERE vbl.item_id = c2i.item_id 
             AND c2i.collection_id IN (p3)
	     AND cl.collection_id = c2i.collection_id
	     AND date < to_date('p2', 'DD-MM-YYYY')
             AND date > to_date('p1', 'DD-MM-YYYY')
	     AND vbl.remote_ip not IN (select remote_ip FROM ip_filter)
             GROUP BY 2,3,1
          ) AS Foo 
       GROUP BY 2,3,1
       ORDER BY 2
<sql>

The output generated by the transformers for the appropriate collection and date range will produce the following resultsets:

XML

<resultset EPersonID="1" EpersonName="Leo Monus" QueryTitle="Items Viewed in a Period of Time" date="Wed Jun 22 16:11:54 EST 2005" p1="10-05-2005"  p2="10-07-2005" p3="4,5,9,10,11,12,13,14,16" rName="items-viewed-in-collection-inTime" render-to="xml document" scaleBy="day" xsltype="xml">
	<result><collection_id>10</collection_id>
	<date day="131" month="05" week="20" year="2005">2005-05-11</date><name>CM:1</name>
	<sum>1</sum></result><result><collection_id>
	9</collection_id><date day="131" month="05" week="20" year="2005">2005-05-11</date>
	<name>CM:40</name><sum>14</sum></result>
	<result><collection_id>11</collection_id><date day="131" month="05" week="20"e; year="2005">2005-05-11</date>
	<name>cm-40-b</name><sum>21</sum></result>
	<result><collection_id>5</collection_id><date day="131" month="05" week="20"year="2005">2005-05-11</date>
	<name>test..1..2..3</name><sum>7</sum></result>
	<result><collection_id>12</collection_id><date day="132" month="05" week="20"   year="2005">2005-05-12</date><name>cm-1-b</name>
	<sum>13</sum></result><result><collection_id>11</collection_id><date day="132" month="05" week="20"        year="2005">2005-05-12</date><name>cm-40-b</name>
	<sum>18</sum></result><result><collection_id>4</collection_id><date day="132" month="05" week="20"e; year="2005">2005-05-12</date><name>potr loadtest</name><sum>28</sum>
	</result><result><collection_id>5</collection_id><date day="132" month="05" week="20" year="2005">2005-05-12</date><name>test..1..etc.

HTML

Image showing example HTML-format query results

Cocoon Graph

Image showing example Cocoon graph format query results