A Web Observatory or A Web Of Observatories – its all about shadows

artwork by Tim Noble and Sue Webster @ http://www.thisismarvelous.com/

One of the big questions I hear coming up again and again is around clarifying this difference – people often talk about A Web Observatory and then (without drawing breath) throw in a comment about THE Web Observatory.

Is this the same thing? Should we care about the difference between A and THE..?

Let me point out the difference between A web server and THE Web ..

  • You can own/manage a web server – no-one owns or manages the Web
  • If your web page or web server goes away only your part of the Web disappears

In effect, you can think of the Web as something that emerges from the Web Servers that are in operation – its the shadow that is cast by all web servers and Web content. Sometimes a shadow looks very different from the individual pieces that combined to cast it.

THE Web Observatory is the shadow that is cast by all the individual Observatories, Datasets and Apps that are in operation.

The arrival of OSO’s …

I was talking to my son about an edition of Mythbusters where they were looking at, of all exotic things, Anvils.  The piece was about making the distinction between REAL anvils (which can be used for iron-work) and apparently similar artefacts (that are only for decoration).

The meme became Anvils vs. ASOs or Anvil-shaped Objects.

I realise this is an idea I’ve been waiting for in the world of Web Observatories to talk about systems that may be similar or even identical in function to Observatories but don’t use the name and may not be focussed on this approach.

Thus I’ve coined the term OSOthe Observatory-shaped Object – to denote systems which are close to the sort of WO system we are talking about even if they were not designed/intended to be an observatory but have the potential to act as an Observatory or be extended to become an observatory.

A classic example of an OSO is the Southampton University ePrints system, which started life as a document repository, but which has been extended to harvest data sets (e.g. from Twitter), to host data sets and link them to academic papers and, critically, to locate and index the existence of other repositories with other data and docs.

So now we have WOs and OSOs !!!

(With thanks to Harrison Brown and Mythbusters)

From Search to Observation: an update


Often the first question I get asked in this space is WHAT IS an Observatory? – In the fullness of time I have come to believe this is not a particularly engaging or rewarding question. IT technologies can be highly similar across a diverse set of platforms and applications ranging from Apple watches to warehouse management systems.

For example, things which appear to share a moderate level of common building blocks can be radically different in their appearance and function. Whilst is popular to point out that your DNA is 99% similar to that of a chimp – what is less often pointed out is that your DNA is 30% similar to a Daffodil! … and so instead of building blocks, I find myself focussing much more on two other questions:HOW IS an Observatory? and WHY IS an Observatory?

Here the variations of what is done with the Observatory technology and who is applying it for what reasons seem to be much more relevant and, in any case, much more interesting.

With this in mind I have recently revisited an earlier paper called “From Search to Observation” in which I argue that even though Search Engines and Observatories share many common architectural elements (databases, API’s, graphics/analytics etc etc ) that the essence of what they are trying to do is NOT identical.

I initially identified more than a dozen processes that seemed to emerge from an analysis of the literature and the wider  dialogue in this space and published these for community feedback – not much dissent so far.

More recently I expanded this analysis on the back of a series of interviews with more than 50 participants and the resulting process list more than tripled what we had seen – particularly as I started to distinguish between input/output factors and internal processes.

An updated paper has been prepared but is not yet published and given the restrictions on page count a list of more 60 factors could not easily be reproduced and so I have included these here for this who wish to comment, refute or discuss the existence or classification of these factors/processes.


Brief Process and Factor definitions


Describes the existance of wide-spread recognition of a theme, person/group, resource/tool or other entity which sets expectations around priority, importance and inclusion of the said entity.

e.g. there may no evidence to support the idea that Tweets are particularly more accurate, enlightening or relevant that micro-posts from any other source and yet the immense cultural impact Twitter has had almost certainly skews expectations of its inclusion in analysis and consideration.


Impact of the cost of usage/operation of WO systems and tying into later emergent Cost-Benefit assessment.

Corporate Structure

Which may affect how/where organisations (not only commercial organisations) are able to participate in terms of authority, jurisdiction, charter, stakeholder impact.


The aspect of connecting to existing groups or creation of new groups via the use of WO – especially where available data/tools align with the objectives and interests of a community.


Technical barriers to entry/participation vs simpler user experience are naturally likely to impact the quantity and quality of particpation in WO systems.


Describes the ambient level of interaction that draws homogenous and heterogenous groups together.

Commercial interests

Describes the existing tendencies around market-share, intellectual property and control which may affect what users are prepared to share and under which conditions,


Describes the process of collecting data/metadata about the WO interaction which might comprise information on the data, the data-source etc

Consign (data to WO)

Describes the process of depositing/linking a data set or tool to the WO

Conspicuous collection

Deals with the impact of making explicit that data is being collected and, potentially, publishing the data or analysis of the data such that previous behaviour may be affected by the disclosure.

Conflict (+ confict resolution)

Describes the process of resolution regarding some asserted fact in the WO (ownership, value, usage, permission etc)


In contrast to a single request/response from a known search engine the process of observation may be characterised as one or more communication processes across several repositories starting with discovery of sources, the disclosure of metadata, the negotiating/establish of technical data exchange and the grant (either manual/technical) of licenses

Canonical Sources

Where more than one repository offers the same or overlapping datasets there will be the requirement to establish a de facto or canonical source.


Clarification is a multi-step process to ascertain values through supplementary enquiry. This may apply to questions of provenance, usage or cost.


Observers will typically need to access the raw data from the one or more repositories which their search has identified – hence a further individual connection protocols and processes will be required


Where the observer’s process requires confirmation of the source (publisher) of the data to be explicitly documented a certificate format and certification process may be required

Charging Models

It is not anticipated that all data that will be observed will necessarily be open data and hence provided free of charge. It is anticipated that observations may involve the payment of a license fee with a mechanism to grant the permissions associated with the license vs those without a license or with a different license


Aspects of access and privacy must be addressed to ensure that data/services are accessible according to legal and ethical standards.


Allows for the process of adjusting/modifying some data/service in line with a known size of external effect such as error or bias.


It is anticipated that meta data including commentary by both users and curators of the data will provide a richer environment for a qualitative understanding of data beyond stored value

Capture / Charge / Crowdsource

The process of providing data to the system through a series of individual events (capture), through the bulk upload of a dataset (charging) or through manual input (crowd-sourcing)


Observation will often be association with longitudinal datasets from one or more sources. Whilst it is not envisaged that all observatories will seek to store all data is it anticipated that each observatory would store some data and hence a process of regular collection, snapshotting or processing of streaming data would be required


Each Data set(s) may form part of a larger analytic or visualisation requiring a series of one or more computations


the process by which an accurate output/outcome may depend on a set of meta-data (relating to the user or the problem statement) as well as the data itself.


Each repository may hold datasets in a variety of formats – metadata associated with the dataset will allow the observer to invoke appropriate format conversion services


A composite data set comprising heterogenous data will allow for the possibility of correlation analysis across disjoint datasets


The data/service which is identified from a repository as part of an observation service would be formally classified according to topics using some knowledge classification schema, some access schema and may also be linked to other data or services in the Observatory


The process by which results are created between users and/or between users and machines. ie Construction involving more than one participant.


Datasets addressing specific research questions may typically be assembled from more than one data source with either homogenous or heterogenous structures allowing for richer analysis of trends and correlation. This may fall into the area of big data or broad data.


The creation of directories of links to locally/remotely hosted data and services


Addressing the exchange of academic credit for the use of the materials or work of others through a formal reference subject to bibliometric analysis.


The creation of logical service/data sub-structures based on community membership, permissions, licenses, confidentiality, jurisdiction or other suitable frameworks


Each series of observations may be made in the context of a research question which informs the relevant curation, commentary and collaboration addressing the research question. The context also informs the services/datasets that are published out to external users of the Observatory


To allow partial access to data and/services by means of user permissions, grant of license, time/date restrictions or other contextualisations of the entities (data, services or users)


The process of sharing data/services on a periodic basis with a specific community for the purposes of general information and synchronistion of understanding


Allowing for the reduction in volume and/or resolution of data by techniques such as arithmetic averaging, periodoc sampling, aggregation and interpolation.


The process by which a data feed or analytics service is used by the WO as part of one/more larger processes


Observations may involve the exchange of data between two or more parties for the achievement of common or complementary goals. This exchange may not involve a charging structure but may nonetheless comprise a formal agreement with a delineation of responsibilities


Datasets, which may be generated/harvested automatically may require a post-hoc (semi-)manual process of selection, deletion, annotation and re-classification.


For data sets which need to be refreshed or multiple streaming services there will be the requirement to co-ordinate or orchestrate the updates and staging of the data (potentially feeding into a new cycle of discovery, assembly and execution)


Addressing the ultimate purpose of WO usage which is to provide distributed solutions to decision support problems.

Cost Benefit

Addressing the resulting conclusions around the operational economics of operating WOs


Addressing the need for results from different sources to be non-contradictory.

Conformity (vs subversion)

Addressing the extent to which WO operations are blocked or enabled through the adoption of standard processes, licenses and methods of recognition and exchange.


The irrational need to start all WO processes with the letter C 😉

More seriously – this addresses legal directives/frameworks which influence the behaviour of participants for fear of legal redress


Addressing the extent to which distributed componets of the larger WO eco-system align to support an overall work-flow assuming a trusted position in providing a range of services and sources.

Contract (contractual agreement)

The creation of formal agreements/rules etc regulating the use/operation of WO systems and services


Support for hypothesis testing and decision modelling


Address the tendency of consistent standards for sources, services and processes to emerge over time (certain aspects of system become less chaotic/dynamic over time)


Addresses the creation of dynamic patterns of themes, and activities in the form of meta-data about the operation/usage of the WO. This belongs to “observing the observatory” or “Observing the Observers” and might be thought of as “observometrics”.


Distinct from academic citation, Credit describes attribution for discovery, participation or sharing within the eco-system resulting generally in positive (vs negative) reputation.


Addressing the meme effect of propogating interest and research on the WO through the community

Collective Action

Addressing the effect of providing a platform around which entities may act collectively based on communities of interest


Accounting for the effect that certain sources, and services may not be free-of-charge but rather offered on a freemium or full commercial basis as a funding model to address the costs of providing the service.

Consensus (Convergance)

Accounting for the effect that understanding around certain topics may converge over time as a result of discussion/collaboration across the WO


Accounting for the effect where partcular patterns of usage, behaviour and operation may become de facto rather than de jure over time as an expression of the wishes, style and preferences of the community of users and providers.


Accounting for the effect of increasing or decreasing reputation in terms of the accuracy, quality, contribution etc of a WO entity (Source, Service or User)


Observers may wish to base sensitive calculations/decisions on the observed data and hence trust + provenance will be required – particularly for automated/unattended processes.


The effect of operation and interoperation of many different datasets, tools, experiments and participants. The not only studies complex social machine but is itself, potentially, a complex social machine.


The results (positive/negative) resulting from the identification, attribution and accountability around the use of data/services under particular agreements

Culture (Cultural norms)

The emerging typical behaviours and standards that informally appear over the lifetime of a social machine. i.e. not expressed through formal contracts but as modus vivendi practice

Hello world!

Well .. after a year of studying Web Science at the Doctoral Training Centre at Southampton University I thought I finally deserved an opinion on matter .. you may not agree with it but that’s life !!