A Web Observatory or A Web Of Observatories – its all about shadows

artwork by Tim Noble and Sue Webster @ http://www.thisismarvelous.com/

One of the big questions I hear coming up again and again is around clarifying this difference – people often talk about A Web Observatory and then (without drawing breath) throw in a comment about THE Web Observatory.

Is this the same thing? Should we care about the difference between A and THE..?

Let me point out the difference between A web server and THE Web ..

  • You can own/manage a web server – no-one owns or manages the Web
  • If your web page or web server goes away only your part of the Web disappears

In effect, you can think of the Web as something that emerges from the Web Servers that are in operation – its the shadow that is cast by all web servers and Web content. Sometimes a shadow looks very different from the individual pieces that combined to cast it.

THE Web Observatory is the shadow that is cast by all the individual Observatories, Datasets and Apps that are in operation.

The arrival of OSO’s …

I was talking to my son about an edition of Mythbusters where they were looking at, of all exotic things, Anvils.  The piece was about making the distinction between REAL anvils (which can be used for iron-work) and apparently similar artefacts (that are only for decoration).

The meme became Anvils vs. ASOs or Anvil-shaped Objects.

I realise this is an idea I’ve been waiting for in the world of Web Observatories to talk about systems that may be similar or even identical in function to Observatories but don’t use the name and may not be focussed on this approach.

Thus I’ve coined the term OSOthe Observatory-shaped Object – to denote systems which are close to the sort of WO system we are talking about even if they were not designed/intended to be an observatory but have the potential to act as an Observatory or be extended to become an observatory.

A classic example of an OSO is the Southampton University ePrints system, which started life as a document repository, but which has been extended to harvest data sets (e.g. from Twitter), to host data sets and link them to academic papers and, critically, to locate and index the existence of other repositories with other data and docs.

So now we have WOs and OSOs !!!

(With thanks to Harrison Brown and Mythbusters)

Types of Web Observatory

The sheer complexity of the types of process that a Web Observatory might support cry out for a more refined definition of Observatories and as part of my own research I have Identified an initial structure that will be tested with communities going forward

The major categories are


These are research based systems attempting to capture/share data, produce/test theory and collaborate on research projects


These are commercial systems attempting to improve financial ratios (esp Profit) and share ratios (esp market share)

    Communities (ranging from small to large)

These are engagement systems attmepting to highlight information in order to modify behaviour encourage actions/participation

The sub-types are

    Personal – ie a community of one
    Communities of Interest or location e.g. Charities,
    Communities of Governance e.g. Government

Web Observatory Facets

Feb 2014

Here is a Concept Map for an Observatory highlighted to show which of the features/concepts have been implemented in a particular case


Here is a first pass at generating the elements of a faceted hierarchy for Web Observatories – these are concepts/foci generated from a textual/thematic analysis of academic papers and other materials around the design and implementation of Observatories

Web Science and the Web Observatory: the changing remit of web curation for research, enrichment and cultural preservation.

Web Science has been defined as the the study of social machines – the hybrid human/virtual solutions and processes that result from the use in society of information and information systems on the Web. Tim Berners-Lee (2009) described them thus:

Real life is and must be full of all kinds of social constraint – the very processes from which society arises. Computers can help if we use them to create abstract social machines on the Web: processes in which the people do the creative work and the machine does the administration. . . The stage is set for an evolutionary growth of new social engines. The ability to create new forms of social process would social engines. The ability to create new forms of social process would be given to the world at large, and development would be rapid.

The Web has evolved beyond a collection of static html pages for academic research to a platform for human interaction in all it’s forms and an open conduit for publishing and self-expression. These social machines exist in Government, Science, Art, Crime, Health and in many virtual categories beyond.

The desire to retain such data, which has been deliberately/explicitly put onto the Web may fit within a widening remit for archivists to preserve works of art, literature, science and other traditional “publications” and yet an increasing body of data is being added that is not explicitly published by any individual but instead comes via a technical platform or channel and is about a topic, about “society” and/or about specific groups globally. This is an immensely valuable resource and may help us to model evolving trends and behaviour in society though the expression of activities on the Web. Not least this resource may help historians understand the 21st Century through more detailed records than have ever been available before the advent of the Web.

This form of publication may be a problematic fit with traditional policies and libraries globally are reacting quickly to expand and re-define what it is to be a library, what constitutes an artefact for preservation and what deserves space in growing (but ultimately limited) digital collections.

This change is not without challenges and not least the donation of the entire Twitter corpus (growing at some 500 million messages per day) to the Library of Congress has highlighted the huge operational issues which come with data at Web scale.

How then can we decide which data to preserve and which to discard? Most recently the Web has been leveraged as a source of huge amounts of information / metadata known as “big data”much of which may be considered the “exhaust” of other human activities on telephone-, social media- and other networks. In the coming decade growing portions of the electronic fabric of society ranging from cars to cameras to pacemakers and household devices may be brought on-line to form an Internet of Things whose data output is predicted to dwarf even the huge volumes of web data we collect today.

Such volumes of data cannot currently be captured and stored in their entirety using available technologies and beyond this the challenges of curation, access, licensing and presentation may take many more years beyond a future storage solution. Museums and libraries have faced the selection challenge for as long as we have had limited shelf space but with the virtually unlimited digital shelf the choices may become less about inclusion/exclusion and more about the resolution of the data stored.

e.g. One example of a particular newspaper vs. weekly examples of selected newspapers vs. all copies of all newspapers ..

In Web Science we are happy to embrace the value that may come from analysing Big Data without assuming that more data is necessarily better per se or that Big Data is inherently insightful or even meaningful! Our chosen tool for Web Science is the development of an instrument akin to a stethoscope in medicine or a telescope in Astronomy – namely the Web Observatory. Something to help us observe the digital footprints left by society as a whole rather than directly observing/intruding into personal spaces.

A single web observatory is a data repository in which data ON the Web or data ABOUT the Web is collected and in turn made available to other users via portals, interfaces or visualisations and users may in turn add or return data to the Observatory based on their own findings, research or experiences.

Whilst some have argued that only data ABOUT the web (webometrics, cybermetrics, bibliometrics etc) should be considered as a focus for study it seems apparent that as data is put ON the Web the behaviours and responses may be detected in responses from users often through data ABOUT the Web (locations, times, tagging/classification etc ) which are metadata items akin to the metadata held ABOUT library collections and are distinct from the books or artefacts themselves. Thus the argument about data vs meta-data becomes a chicken/egg question of socio-technical effects: does the technical change the social or vice versa?

One of the key challenges around the curation of so much data across so many perspectives is to encourage a reduction of waste/repetition so that data gathered by Observatory A may be discovered and re-used/re-purposed by Observatory B. We are looking forward to a collection of interoperating observatories that will form the fabric of a World-wide Web Observatory. These many distributed repositories/collections may be based on varied technologies and approaches but we are calling for shared standards of identification, metadata, licensing and collaboration. Such a emergent system may accelerate the efforts of academic researchers by linking publications to underlying research data, notes about usage and methodology and allow the study of hybrid data assets and synthetic artefacts that are available nowhere else.

The pressing need is to co-ordinate the skills of the archivists with the needs of the researchers and the understanding of the technologists into an interdisciplinary view of how the Web could be sampled, preserved and made available as an invaluable tool for current researchers, historians and for future generations.