Scratchpad hosting

Simon Rycroft, Ben Scott (NHM), Dominik Röpert & Lorna Morris (BGBM)

KEY RESOURCES
Distributing Scratchpad servers report
Test version of distributed Scratchpad server report
Report on Scratchpad usage statistics options

Developing the Scratchpad Servers

Computers are inevitably subject to failures beyond the control of the programmer. For example, power cuts, network failure and so on. To maintain continuity of service, and therefore a reputation for reliability, ViBRANT is creating a network of Scratchpad servers at different physical locations that can stand in for missing machines in the event of failure.

This is the first step to a vision that allows individual Scratchpads to be hosted on different servers, but which share the same code base and are upgraded together.

Mirroring

A mirror site is a remote copy that reflects changes in the original. A test version of a mirror of the main Scratchpad server (Quartz) has been developed using the same operating system and software as Quartz, enabling it to be quickly and easily configured for use. Implementing this test server has enabled us to develop a script that will facilitate automating much of the process of creating a Scratchpad mirror.

There are various ways to achieve a mirror that deliver different benefits and have different costs. We decided to use only Ægir for keeping a site mirror up-to-date with its origin site. This decision was made primarily to keep things as simple as possible.

Distributing Scratchpad servers

Until now the only multiple site Scratchpad installation was the one hosted at the NHM. ViBRANT plans to create a Scratchpad distribution that other users/institutions can download and install on their own servers.

The next steps will be to start creating Scratchpads whose primary server is the BGBM machine. These Scratchpads would then be mirrored on the NHM server. This process cannot be started, however, until we have made significant changes to the way we currently handle our DNS. Once the DNS changes have been made, there will be no restriction to the number of Scratchpad nodes we can have running across the ViBRANT partners, and therefore no limit to the number of Scratchpad users and usage.

Scratchpad integration

Each Scratchpad is a separate, independent entity whose data are owned by the group that manage the site. Nevertheless, it would be useful to be able to ask questions across all Scratchpad sites, such as "which sites say something about the taxon Aus bus?". We will achieve this using the Scratchpad registry. The registry is intended to run as a service at the NHM and will allow Scratchpads hosted by both the NHM and other institutions or individuals, to report their presence and statistics. This process will be fully automated and will be enabled by default on all Scratchpads. The exact configuration of the Scratchpad registry is yet to be finalised, although it is likely to be very similar to GBIF's Integrated Publishing Toolkit (IPT) registry system.

A secondary issue relates to the collection of usage statistics, discussed in Brake et al (2011)[1], so the registry will collect basic information about the rates of use of each site. We will also continue to collect usage statistics using the Google Analytics tool that currently aggregates statistics across all Scratchpads and is therefore of much benefit to us. The Google tool is only able to offer this service for individual servers and all the Scratchpads it supports, so moving to a distributed model requires some re-design of the data collection.

There are privacy concerns regarding the collection of usage information, although users will be advised in the terms and conditions of the Scratchpad that site usage statistics will be collected, and every effort will be taken to ensure that statistics are anonymised before publication or distribution, in compliance with relevant codes for data protection. We note that it may not be possible to collect such data from servers in some countries because of different legal frameworks.

The basic information we want to collect is:

  • Number of users, and statistics regarding the login of those users.

  • Quantity of data entered into the site, including statistics regarding the types of data entered and by whom.

  • Statistics regarding the use of the sites by anonymous users (those not logged in to the site and providing content to it) will also be collected. These will be additional to the data collected from Google Analytics or a similar tool.

  • Metadata about the site, helping to group analysis of multiple Scratchpads together (e.g. are bird Scratchpads more popular than banana Scratchpads?).

Usage statistics are a very important tool to help project management. Usage statistics help to prioritise further development of a project like ViBRANT. See also 'Support Services'.                                                                               

 

 

 

Scratchpad usage statistics from February 2007 to September 2011. The black dashed line represents the number of Scratchpad community sites (in hundreds) and the blue solid line represents the number of registered users (in thousands). As of September 2011 we have switched to recording the number of active users (currently 4424) since this figure provides a more accurate guide to usage.


References