top of page

Software & Database

OCSEAN is dedicated to develop software for integrated data analyses and create open access databases

Screenshot 2022-03-18 at 11.40.32 AM.png

Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation versus expression, evolution of language sounds versus word use, and country-level economic metrics versus cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a ‘structural’ component analogous to a clustering, and an underlying ‘relationship’ between those structures. This allows a ‘structural comparison’ between two similarity matrices using their predictability from ‘structure’. Significance is assessed with the help of re-sampling appropriate for each dataset.

CLARITY is available as an R package from:

github.com/danjlawson/CLARITY.

Cite: 

CLARITY: comparing heterogeneous data using dissimilarity

Lawson DJ, Solanki V, Yanovich I, Dellert J, Ruck D, Endicott P. CLARITY: comparing heterogeneous data using dissimilarity. R Soc Open Sci. 2021 Dec 8;8(12):202182. doi: 10.1098/rsos.202182. PMID: 34909208; PMCID: PMC8652278.

In its quest at integrating linguistics, genetics, and archaeological observation in a single analytical framework, the OCSEAN project is facing a major challenge: the non-standardized, poorly available, and sparse nature of archaeological data. To fill the gap, listing the open access datasets for the Southeast Asia and Pacific areas was initiated under  Open-archeOcsean  - an interactive catalogue of open source datasets for the archaeology of the Pacific and Southeast Asia regions. Development of Open-archeOcsean can be continued as a collective endeavour, within and beyond the OCSEAN community, all contribution being welcome.

 

Versions of Open-archeOcsean are archived for long-term preservation on:

gituhub

Zenodo

Softwareheritage 

​

The contact address for contributions to

open-archeOcsean is: sebastien.plutniak_at_cnrs.fr

​

Presentations of Open-archeOcsean

by Sébastien Plutniak :

OCSEAN Bali Conference 2025,

July 22nd 2025

Title: "Open and Reusable Data for Southeast Asia and Oceania Archaeology: Review of Available Ressources and Prospects"
 

Computer Applications and Quantitative methods in Archaeology Australasia Conference,

October 3d, 2025
Title:  "Map to Navigate the Data Archipelago: the “Open-archeOcsean” Catalogue of Open-source Datasets for Pacific and Southeast Asia Archaeology. Situation, Limits, and Prospects"

 

​

​

​

​

​​This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SkÅ‚odowska-Curie grant agreement No 873207.

Ocsean logo
  • Facebook
  • Twitter
  • LinkedIn

© 2022 by The OCSEAN Consortium

bottom of page