On-chain data analysis fundamentals: what, why and how?

Share

This article aims to provide you with an overview and introduction to on-chain data analysis — we will dive into the specifics in future ones.

Essentially, on-chain data analysis (or also called on-chain analysis) encompasses the necessary steps to understand and visualise the numerous data points stored on a blockchain with the aim to gain valuable insights. Hence, the term “on-chain” refers to the type of data used in the analysis: For any given blockchain network, when the data is stored on their blockchain, it’s labeled “on-chain” (data), while “off-chain” points to data outside of that specific blockchain network. Depending on the blockchain network, raw on-chain data may compromise of data related to blocks (such as block hash and block timestamp), data related to transactions (including the addresses of the sender and beneficiary, and transferred amount), and smart contract code and data. These data points can be collected, cleaned, and stored by applying various data analysis methodologies and tools and then modelled and queried to surface connections, generate key findings, and create a basis to subsequently derive further relevant intelligence.

By using historical and/or real-time on-chain data, manifold use cases for on-chain data analysis have emerged and may include among many others:
:: investors using historical on-chain data points for price analysis to guide their digital asset investment decisions;
:: ecosystem actors analysing key metrics (for example market capitalisation and transaction volume) to understand macro trends;
:: companies utilising real-time on-chain data for transaction screening to comply with anti-money laundering obligations (read more about these obligations in our previous article on measures to detect ML/FT involving digital assets).

So, how does on-chain data analysis work? To understand this, we need to start with some blockchain fundamentals and how the technology impacts the stored data. A blockchain can basically be seen as a database that is updated and shared across many computers in a network (called nodes) in identical data stores (called ledgers), over which not a single entity has control (decentralised). Further, each data record is encrypted using algorithms (called hashes, which are unique, and unchangeable).

Let’s look at an illustrative example: Person A would like to send a specific amount of a cryptocurrency from their account (called wallet) to another one. To do so, person A poses within that blockchain network a request for transaction. Information about this request is then shared across the network in an encrypted way for verification and confirmation of legitimacy — which is done following a pre-defined algorithmic validation method (called consensus mechanism). Once a consensus is reached, the data related to the transaction is added to the database, meaning stored encrypted in a block together with other transactions — and immediately, all the individual data stores hold by the network participants are updated. Important here is, that existing data points are not overwritten by new data entries: Blocks are connected to each other in sequential order, which creates a permanent data record, or in other words a history of the blockchain. Lastly, records are immutable due to the encryption, as the network would notice any changes to the data.

The next question is then how to obtain these historical and real-time data points. The answer depends on the blockchain type, which determines the access to the network: “Public” blockchains have open access, hence anybody is allowed to access the network and subsequently the stored data. In contrast, “private” blockchains only grant access to whitelisted participants.

Once the raw data is obtained, depending on the specific use case (in e.g. what insights one would like to gain), different data modelling tools are applied. We will provide you with some concrete examples in future blog posts. Stay tuned!

Would you like to stay up-to-date?

Subscribe to our Medium page. Follow us on Twitter and connect on LinkedIn.