From Search to Observation: an update

Background

Often the first question I get asked in this space is WHAT IS an Observatory? – In the fullness of time I have come to believe this is not a particularly engaging or rewarding question. IT technologies can be highly similar across a diverse set of platforms and applications ranging from Apple watches to warehouse management systems.

For example, things which appear to share a moderate level of common building blocks can be radically different in their appearance and function. Whilst is popular to point out that your DNA is 99% similar to that of a chimp – what is less often pointed out is that your DNA is 30% similar to a Daffodil! … and so instead of building blocks, I find myself focussing much more on two other questions:HOW IS an Observatory? and WHY IS an Observatory?

Here the variations of what is done with the Observatory technology and who is applying it for what reasons seem to be much more relevant and, in any case, much more interesting.

With this in mind I have recently revisited an earlier paper called “From Search to Observation” in which I argue that even though Search Engines and Observatories share many common architectural elements (databases, API’s, graphics/analytics etc etc ) that the essence of what they are trying to do is NOT identical.

I initially identified more than a dozen processes that seemed to emerge from an analysis of the literature and the wider  dialogue in this space and published these for community feedback – not much dissent so far.

More recently I expanded this analysis on the back of a series of interviews with more than 50 participants and the resulting process list more than tripled what we had seen – particularly as I started to distinguish between input/output factors and internal processes.

An updated paper has been prepared but is not yet published and given the restrictions on page count a list of more 60 factors could not easily be reproduced and so I have included these here for this who wish to comment, refute or discuss the existence or classification of these factors/processes.

Overview

Brief Process and Factor definitions

Celebrity

Describes the existance of wide-spread recognition of a theme, person/group, resource/tool or other entity which sets expectations around priority, importance and inclusion of the said entity.

e.g. there may no evidence to support the idea that Tweets are particularly more accurate, enlightening or relevant that micro-posts from any other source and yet the immense cultural impact Twitter has had almost certainly skews expectations of its inclusion in analysis and consideration.

Cost

Impact of the cost of usage/operation of WO systems and tying into later emergent Cost-Benefit assessment.

Corporate Structure

Which may affect how/where organisations (not only commercial organisations) are able to participate in terms of authority, jurisdiction, charter, stakeholder impact.

Community

The aspect of connecting to existing groups or creation of new groups via the use of WO – especially where available data/tools align with the objectives and interests of a community.

Convenience

Technical barriers to entry/participation vs simpler user experience are naturally likely to impact the quantity and quality of particpation in WO systems.

Collegiality

Describes the ambient level of interaction that draws homogenous and heterogenous groups together.

Commercial interests

Describes the existing tendencies around market-share, intellectual property and control which may affect what users are prepared to share and under which conditions,

Collection

Describes the process of collecting data/metadata about the WO interaction which might comprise information on the data, the data-source etc

Consign (data to WO)

Describes the process of depositing/linking a data set or tool to the WO

Conspicuous collection

Deals with the impact of making explicit that data is being collected and, potentially, publishing the data or analysis of the data such that previous behaviour may be affected by the disclosure.

Conflict (+ confict resolution)

Describes the process of resolution regarding some asserted fact in the WO (ownership, value, usage, permission etc)

Communication

In contrast to a single request/response from a known search engine the process of observation may be characterised as one or more communication processes across several repositories starting with discovery of sources, the disclosure of metadata, the negotiating/establish of technical data exchange and the grant (either manual/technical) of licenses

Canonical Sources

Where more than one repository offers the same or overlapping datasets there will be the requirement to establish a de facto or canonical source.

Clarification

Clarification is a multi-step process to ascertain values through supplementary enquiry. This may apply to questions of provenance, usage or cost.

Connection

Observers will typically need to access the raw data from the one or more repositories which their search has identified – hence a further individual connection protocols and processes will be required

Certification

Where the observer’s process requires confirmation of the source (publisher) of the data to be explicitly documented a certificate format and certification process may be required

Charging Models

It is not anticipated that all data that will be observed will necessarily be open data and hence provided free of charge. It is anticipated that observations may involve the payment of a license fee with a mechanism to grant the permissions associated with the license vs those without a license or with a different license

Confidentiality

Aspects of access and privacy must be addressed to ensure that data/services are accessible according to legal and ethical standards.

Calibration

Allows for the process of adjusting/modifying some data/service in line with a known size of external effect such as error or bias.

Commentary

It is anticipated that meta data including commentary by both users and curators of the data will provide a richer environment for a qualitative understanding of data beyond stored value

Capture / Charge / Crowdsource

The process of providing data to the system through a series of individual events (capture), through the bulk upload of a dataset (charging) or through manual input (crowd-sourcing)

Collection

Observation will often be association with longitudinal datasets from one or more sources. Whilst it is not envisaged that all observatories will seek to store all data is it anticipated that each observatory would store some data and hence a process of regular collection, snapshotting or processing of streaming data would be required

Computation

Each Data set(s) may form part of a larger analytic or visualisation requiring a series of one or more computations

Contextualise

the process by which an accurate output/outcome may depend on a set of meta-data (relating to the user or the problem statement) as well as the data itself.

Conversion

Each repository may hold datasets in a variety of formats – metadata associated with the dataset will allow the observer to invoke appropriate format conversion services

Correlation

A composite data set comprising heterogenous data will allow for the possibility of correlation analysis across disjoint datasets

Classification

The data/service which is identified from a repository as part of an observation service would be formally classified according to topics using some knowledge classification schema, some access schema and may also be linked to other data or services in the Observatory

Co-creation

The process by which results are created between users and/or between users and machines. ie Construction involving more than one participant.

Construction

Datasets addressing specific research questions may typically be assembled from more than one data source with either homogenous or heterogenous structures allowing for richer analysis of trends and correlation. This may fall into the area of big data or broad data.

Catalogues

The creation of directories of links to locally/remotely hosted data and services

Citation

Addressing the exchange of academic credit for the use of the materials or work of others through a formal reference subject to bibliometric analysis.

Compartmentalisation

The creation of logical service/data sub-structures based on community membership, permissions, licenses, confidentiality, jurisdiction or other suitable frameworks

Contextualise

Each series of observations may be made in the context of a research question which informs the relevant curation, commentary and collaboration addressing the research question. The context also informs the services/datasets that are published out to external users of the Observatory

Constrain(ts)

To allow partial access to data and/services by means of user permissions, grant of license, time/date restrictions or other contextualisations of the entities (data, services or users)

Circulation

The process of sharing data/services on a periodic basis with a specific community for the purposes of general information and synchronistion of understanding

Conflation-Compression

Allowing for the reduction in volume and/or resolution of data by techniques such as arithmetic averaging, periodoc sampling, aggregation and interpolation.

Consumption

The process by which a data feed or analytics service is used by the WO as part of one/more larger processes

Collaboration

Observations may involve the exchange of data between two or more parties for the achievement of common or complementary goals. This exchange may not involve a charging structure but may nonetheless comprise a formal agreement with a delineation of responsibilities

Curation

Datasets, which may be generated/harvested automatically may require a post-hoc (semi-)manual process of selection, deletion, annotation and re-classification.

Choreography

For data sets which need to be refreshed or multiple streaming services there will be the requirement to co-ordinate or orchestrate the updates and staging of the data (potentially feeding into a new cycle of discovery, assembly and execution)

Confirmation

Addressing the ultimate purpose of WO usage which is to provide distributed solutions to decision support problems.

Cost Benefit

Addressing the resulting conclusions around the operational economics of operating WOs

Coherence

Addressing the need for results from different sources to be non-contradictory.

Conformity (vs subversion)

Addressing the extent to which WO operations are blocked or enabled through the adoption of standard processes, licenses and methods of recognition and exchange.

Compulsion

The irrational need to start all WO processes with the letter C 😉

More seriously – this addresses legal directives/frameworks which influence the behaviour of participants for fear of legal redress

Cohesion

Addressing the extent to which distributed componets of the larger WO eco-system align to support an overall work-flow assuming a trusted position in providing a range of services and sources.

Contract (contractual agreement)

The creation of formal agreements/rules etc regulating the use/operation of WO systems and services

Conclusions

Support for hypothesis testing and decision modelling

Consistency

Address the tendency of consistent standards for sources, services and processes to emerge over time (certain aspects of system become less chaotic/dynamic over time)

Cascades

Addresses the creation of dynamic patterns of themes, and activities in the form of meta-data about the operation/usage of the WO. This belongs to “observing the observatory” or “Observing the Observers” and might be thought of as “observometrics”.

Credit

Distinct from academic citation, Credit describes attribution for discovery, participation or sharing within the eco-system resulting generally in positive (vs negative) reputation.

Catalysing

Addressing the meme effect of propogating interest and research on the WO through the community

Collective Action

Addressing the effect of providing a platform around which entities may act collectively based on communities of interest

Commercialisation

Accounting for the effect that certain sources, and services may not be free-of-charge but rather offered on a freemium or full commercial basis as a funding model to address the costs of providing the service.

Consensus (Convergance)

Accounting for the effect that understanding around certain topics may converge over time as a result of discussion/collaboration across the WO

Convention

Accounting for the effect where partcular patterns of usage, behaviour and operation may become de facto rather than de jure over time as an expression of the wishes, style and preferences of the community of users and providers.

Credibility

Accounting for the effect of increasing or decreasing reputation in terms of the accuracy, quality, contribution etc of a WO entity (Source, Service or User)

Confidence

Observers may wish to base sensitive calculations/decisions on the observed data and hence trust + provenance will be required – particularly for automated/unattended processes.

Complexity

The effect of operation and interoperation of many different datasets, tools, experiments and participants. The not only studies complex social machine but is itself, potentially, a complex social machine.

Consequence

The results (positive/negative) resulting from the identification, attribution and accountability around the use of data/services under particular agreements

Culture (Cultural norms)

The emerging typical behaviours and standards that informally appear over the lifetime of a social machine. i.e. not expressed through formal contracts but as modus vivendi practice