Often the first question I get asked in this space is WHAT IS an Observatory? – In the fullness of time I have come to believe this is not a particularly engaging or rewarding question. IT technologies can be highly similar across a diverse set of platforms and applications ranging from Apple watches to warehouse management systems.
For example, things which appear to share a moderate level of common building blocks can be radically different in their appearance and function. Whilst is popular to point out that your DNA is 99% similar to that of a chimp – what is less often pointed out is that your DNA is 30% similar to a Daffodil! … and so instead of building blocks, I find myself focussing much more on two other questions:HOW IS an Observatory? and WHY IS an Observatory?
Here the variations of what is done with the Observatory technology and who is applying it for what reasons seem to be much more relevant and, in any case, much more interesting.
With this in mind I have recently revisited an earlier paper called “From Search to Observation” in which I argue that even though Search Engines and Observatories share many common architectural elements (databases, API’s, graphics/analytics etc etc ) that the essence of what they are trying to do is NOT identical.
I initially identified more than a dozen processes that seemed to emerge from an analysis of the literature and the wider dialogue in this space and published these for community feedback – not much dissent so far.
More recently I expanded this analysis on the back of a series of interviews with more than 50 participants and the resulting process list more than tripled what we had seen – particularly as I started to distinguish between input/output factors and internal processes.
An updated paper has been prepared but is not yet published and given the restrictions on page count a list of more 60 factors could not easily be reproduced and so I have included these here for this who wish to comment, refute or discuss the existence or classification of these factors/processes.
Brief Process and Factor definitions
Describes the existance of wide-spread recognition of a theme, person/group, resource/tool or other entity which sets expectations around priority, importance and inclusion of the said entity.
e.g. there may no evidence to support the idea that Tweets are particularly more accurate, enlightening or relevant that micro-posts from any other source and yet the immense cultural impact Twitter has had almost certainly skews expectations of its inclusion in analysis and consideration.
Impact of the cost of usage/operation of WO systems and tying into later emergent Cost-Benefit assessment.
Which may affect how/where organisations (not only commercial organisations) are able to participate in terms of authority, jurisdiction, charter, stakeholder impact.
The aspect of connecting to existing groups or creation of new groups via the use of WO – especially where available data/tools align with the objectives and interests of a community.
Technical barriers to entry/participation vs simpler user experience are naturally likely to impact the quantity and quality of particpation in WO systems.
Describes the ambient level of interaction that draws homogenous and heterogenous groups together.
Describes the existing tendencies around market-share, intellectual property and control which may affect what users are prepared to share and under which conditions,
Describes the process of collecting data/metadata about the WO interaction which might comprise information on the data, the data-source etc
Consign (data to WO)
Describes the process of depositing/linking a data set or tool to the WO
Deals with the impact of making explicit that data is being collected and, potentially, publishing the data or analysis of the data such that previous behaviour may be affected by the disclosure.
Conflict (+ confict resolution)
Describes the process of resolution regarding some asserted fact in the WO (ownership, value, usage, permission etc)
In contrast to a single request/response from a known search engine the process of observation may be characterised as one or more communication processes across several repositories starting with discovery of sources, the disclosure of metadata, the negotiating/establish of technical data exchange and the grant (either manual/technical) of licenses
Where more than one repository offers the same or overlapping datasets there will be the requirement to establish a de facto or canonical source.
Clarification is a multi-step process to ascertain values through supplementary enquiry. This may apply to questions of provenance, usage or cost.
Observers will typically need to access the raw data from the one or more repositories which their search has identified – hence a further individual connection protocols and processes will be required
Where the observer’s process requires confirmation of the source (publisher) of the data to be explicitly documented a certificate format and certification process may be required
It is not anticipated that all data that will be observed will necessarily be open data and hence provided free of charge. It is anticipated that observations may involve the payment of a license fee with a mechanism to grant the permissions associated with the license vs those without a license or with a different license
Aspects of access and privacy must be addressed to ensure that data/services are accessible according to legal and ethical standards.
Allows for the process of adjusting/modifying some data/service in line with a known size of external effect such as error or bias.
It is anticipated that meta data including commentary by both users and curators of the data will provide a richer environment for a qualitative understanding of data beyond stored value
Capture / Charge / Crowdsource
The process of providing data to the system through a series of individual events (capture), through the bulk upload of a dataset (charging) or through manual input (crowd-sourcing)
Observation will often be association with longitudinal datasets from one or more sources. Whilst it is not envisaged that all observatories will seek to store all data is it anticipated that each observatory would store some data and hence a process of regular collection, snapshotting or processing of streaming data would be required
Each Data set(s) may form part of a larger analytic or visualisation requiring a series of one or more computations
the process by which an accurate output/outcome may depend on a set of meta-data (relating to the user or the problem statement) as well as the data itself.
Each repository may hold datasets in a variety of formats – metadata associated with the dataset will allow the observer to invoke appropriate format conversion services
A composite data set comprising heterogenous data will allow for the possibility of correlation analysis across disjoint datasets
The data/service which is identified from a repository as part of an observation service would be formally classified according to topics using some knowledge classification schema, some access schema and may also be linked to other data or services in the Observatory
The process by which results are created between users and/or between users and machines. ie Construction involving more than one participant.
Datasets addressing specific research questions may typically be assembled from more than one data source with either homogenous or heterogenous structures allowing for richer analysis of trends and correlation. This may fall into the area of big data or broad data.
The creation of directories of links to locally/remotely hosted data and services
Addressing the exchange of academic credit for the use of the materials or work of others through a formal reference subject to bibliometric analysis.
The creation of logical service/data sub-structures based on community membership, permissions, licenses, confidentiality, jurisdiction or other suitable frameworks
Each series of observations may be made in the context of a research question which informs the relevant curation, commentary and collaboration addressing the research question. The context also informs the services/datasets that are published out to external users of the Observatory
To allow partial access to data and/services by means of user permissions, grant of license, time/date restrictions or other contextualisations of the entities (data, services or users)
The process of sharing data/services on a periodic basis with a specific community for the purposes of general information and synchronistion of understanding
Allowing for the reduction in volume and/or resolution of data by techniques such as arithmetic averaging, periodoc sampling, aggregation and interpolation.
The process by which a data feed or analytics service is used by the WO as part of one/more larger processes
Observations may involve the exchange of data between two or more parties for the achievement of common or complementary goals. This exchange may not involve a charging structure but may nonetheless comprise a formal agreement with a delineation of responsibilities
Datasets, which may be generated/harvested automatically may require a post-hoc (semi-)manual process of selection, deletion, annotation and re-classification.
For data sets which need to be refreshed or multiple streaming services there will be the requirement to co-ordinate or orchestrate the updates and staging of the data (potentially feeding into a new cycle of discovery, assembly and execution)
Addressing the ultimate purpose of WO usage which is to provide distributed solutions to decision support problems.
Addressing the resulting conclusions around the operational economics of operating WOs
Addressing the need for results from different sources to be non-contradictory.
Conformity (vs subversion)
Addressing the extent to which WO operations are blocked or enabled through the adoption of standard processes, licenses and methods of recognition and exchange.
The irrational need to start all WO processes with the letter C 😉
More seriously – this addresses legal directives/frameworks which influence the behaviour of participants for fear of legal redress
Addressing the extent to which distributed componets of the larger WO eco-system align to support an overall work-flow assuming a trusted position in providing a range of services and sources.
Contract (contractual agreement)
The creation of formal agreements/rules etc regulating the use/operation of WO systems and services
Support for hypothesis testing and decision modelling
Address the tendency of consistent standards for sources, services and processes to emerge over time (certain aspects of system become less chaotic/dynamic over time)
Addresses the creation of dynamic patterns of themes, and activities in the form of meta-data about the operation/usage of the WO. This belongs to â€œobserving the observatoryâ€ or â€œObserving the Observersâ€ and might be thought of as â€œobservometricsâ€.
Distinct from academic citation, Credit describes attribution for discovery, participation or sharing within the eco-system resulting generally in positive (vs negative) reputation.
Addressing the meme effect of propogating interest and research on the WO through the community
Addressing the effect of providing a platform around which entities may act collectively based on communities of interest
Accounting for the effect that certain sources, and services may not be free-of-charge but rather offered on a freemium or full commercial basis as a funding model to address the costs of providing the service.
Accounting for the effect that understanding around certain topics may converge over time as a result of discussion/collaboration across the WO
Accounting for the effect where partcular patterns of usage, behaviour and operation may become de facto rather than de jure over time as an expression of the wishes, style and preferences of the community of users and providers.
Accounting for the effect of increasing or decreasing reputation in terms of the accuracy, quality, contribution etc of a WO entity (Source, Service or User)
Observers may wish to base sensitive calculations/decisions on the observed data and hence trust + provenance will be required – particularly for automated/unattended processes.
The effect of operation and interoperation of many different datasets, tools, experiments and participants. The not only studies complex social machine but is itself, potentially, a complex social machine.
The results (positive/negative) resulting from the identification, attribution and accountability around the use of data/services under particular agreements
Culture (Cultural norms)
The emerging typical behaviours and standards that informally appear over the lifetime of a social machine. i.e. not expressed through formal contracts but as modus vivendi practice