Fair and modular blockchain data infrastructure for open science and society – FairOnChain
Public blockchains, such as Bitcoin and Ethereum, are publicly accessible by design, but their data cannot be easily accessed and analysed without proper structure and indexing. The objective of this project is to develop a publicly accessible infrastructure that enables easy access and searchability of blockchain data in accordance with the FAIR principles (Findable Accessible Interoperable Replicable) of open science. This will promote complete transparency and reproducibility of scientific analysis results in the blockchain field - something that does not exist today - facilitating the growth of new and existing applications and collaborations.
At present, structured analyses are typically performed using proprietary solutions and databases, which make reproducibility and sharing of data within the scientific community challenging and expensive. Additionally, even though scientific studies often perform common operations to collect data systematically, the tools and libraries developed and used are rarely shared with the broader scientific community. As a result, similar research carried out by different institutions and groups often requires the re-implementation of the same software tools, leading to wasted resources and an inability to reproduce and compare results.
As part of this project, we plan to provide the scientific community with:
(a) Publicly accessible and expandable datasets and infrastructure that include structured, daily updated blockchain transaction data. Researchers will be able to access raw transaction data and community-maintained, enriched datasets in a uniform and open manner, promoting the availability and reuse of these complex data.
(b) An open-source software framework and standardized data access APIs that enable effective querying, annotating, and referencing of data, and building well-described reusable workflows and pipelines that will facilitate the exchange and replication of scientific results according to the FAIR principles of open science.
This project also aims to provide an effective solution and tool for the European Commission to certify blockchain transactions as required by the recently voted (08/12/2022) European tax rules, also known as the eighth Directive on Administrative Cooperation (DAC8), and for which a public and generally accepted solution does not yet exist.
According to Google Scholar, over 547,000 scientific articles have been published using the keyword "blockchain" and 13,000 using the combination "blockchain data" since 2013. These numbers are growing rapidly in recent years. However, only 200 c.a. publicly accessible datasets associated with these studies have been identified, and the quality and reliability of these reference data collections are generally unknown. This pattern highlights the pressing need for an open and reusable research reference database and software solution in the rapidly growing field of blockchain networks and data analytics. In other research fields such as healthcare, bioscience, particle physics, geoscience, and astrophysics, there are publicly accessible and maintained methods, open-source software tools, and databases (e.g., UK Biobank, UniProt, CERN Open Data, ESA Gaia Archive, NASA planetary data system) where researchers can collect and share their data. However, this is not currently the case in the field of blockchain data: for this reason, the objective of this project is to fill this gap and provide an effective and public accepted solution to this pressing need.
Project coordination
Julien Prat (Centre de Recherche en Economie et Stastistique - CREST)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
EPFL / SCI STI MM École Polytechnique Fédérale de Lausanne
HEG / HES-SO Haute École de Gestion de Genève - HES-SO
CREST Centre de Recherche en Economie et Stastistique - CREST
ICL Imperial College London
Help of the ANR 135,597 euros
Beginning and duration of the scientific project:
November 2023
- 24 Months