Price data and transactions

7 min readJun 21, 2022

When performing quantitative analysis, data is the foundation to further discovery. While studying cryptocurrency prices, required data can be found in exchange platforms. This data needs to be collected in order to perform the analysis. Nonetheless, the main problem is data being fragmented into different exchange platforms around the industry. The good part is that prices will usually be close in value although they come from different exchanges, and if they are not, that will be an arbitrage opportunity. The bad part is that finding those arbitrage opportunities could be computationally very expensive, because it is needed to check all prices in order to get insight about the most optimal differences.

While onboarding on the quest of collecting the data, the main two classes of cryptocurrency exchanges emerge, i.e. Centralized and Decentralized. These two classes have more differences than one would expect, which go from the execution mechanism to price calculation. While both types have a list of pairs where trades are executed, the abstraction called pair is different between the two. On the one hand, centralized exchanges usually function using order books, which imply there are different types of orders, and the interaction between those orders is translated into price. On the other hand, decentralized exchanges usually work implementing an equation that generates a bonding curve, generating deterministic pricing.

Order books

An order book is a list of orders, each one at a given price level. For each price, it contains the amount of asset A that is resting for the matching engine to be executed. Suppose we are trading at exchange XYZ in the pair A/USD and the last execution took place at 100$. Then, the order book is usually decomposed into buy orders, sell orders and market order history.

1. Buy orders are those that exist in a price level that is lower than 100$. They provide sell liquidity.

2. Sell orders are those that exist in a price level that is higher than 100$. They provide buy liquidity

3. The market order history is an ordered list of every market order that has taken place in the past. It shows the list of actions that extracted liquidity from the order book.

The difference between the highest buy price level and the lowest sell price level is called bid-ask spread. Some strategies take advantage of it.

Binance trade terminal showing Depth of Market (DOM).

The market order history is publicly downloadable from some exchanges in the form of “trades data”, but this is not the case for every exchange, as tick data is rarely free and APIs are usually capped. This type of data is the most pure, mostly because it literally registers every event that took place in the order book. Also, almost every other standard type of data is some sort of feature engineering over this data. For example, after selecting two events, one could order them timewise, and then, the older one would give an open price, while the latest one would give a closing price, and the covered interval would be the difference between the timestamps of the events.

On-chain transactions

Transactions are a data type in common blockchains. A list of them is included in each block. There are a plenty of reasons for an agent to send a transaction in order to include it in a block. One of them could be to trade some tokens in a decentralized exchange. It is worth to mention that Layer 1 blockchains have solutions built on top of them which are called Layer 2, and they also contain transactions, usually for sake of speed, but more generally, in order to solve some problem that prevail in the underlying blockchain, using another technology to fix the problem and sending the result to the original one.

Example transaction (Layer 1) in a decentralized exchange (Uniswap V3 — Ethereum). Image from Etherscan.

In a Blockchain (thinking about Ethereum), assets are usually represented as tokens, a type of smart contract. This smart contract usually has a variable named “balances”, and this variable is a mapping of addresses to numbers, which means that it assigns a quantity of tokens to each address, with default value 0. Their functionality is more complex, and won’t be covered here.

Tokens are usually traded using decentralized exchanges (DEX). For example, Uniswap is a DEX, which itself is composed of multiple liquidity pools, that represent the different tradeable pairs. In this case, the exchange uses a pricing mechanism based on a bonding curve, which essentially spreads the provided liquidity along a token price/supply curve.

Uniswap trade effect analysis from a Trader perspective.

There are some differences between a traditional order book and a bonding curve. In the case of a traditional order book, when a market order reaches the engine, it is paired with resting limit orders, executing the trade. Bonding curves work following a mathematical equation that itself depends on the latest state of the liquidity pool.

Arbitrages and Multi-hops

There could be the case that one asset is traded in multiple liquidity pools, and even in multiple exchanges. If there is some discrepancy between two pools which identical pair is A/B, an arbitrage trade would be buying in the one which price is lower and selling the bought quantity in the one which price is higher, or vice versa. This could be generalized to multiple liquidity pools with some overall discrepancy, and arbitrage trades could be performed, but this time it would be called multi-hop.

Multihop arbitrage between a Sushiswap pool and two Uniswap pools

These kinds of transactions are tricky to parse, mainly because you will have to answer some questions in order to get some meaning. Should I take into account every swap? Should I only process the given asset and the received one? Which exchange should be associated with this trade? And many more…

OHLCV data

This type of data is named by the initials of Open, High, Low, Close and Volume. This format is based on time, reason of the Opentime (and Closetime) columns, that are usually in the dataset, giving context. The columns of the data can also contain more information about what happened in the chosen timeframe. Usually, this information is used to perform some operations, generating new features or indicators. If columns are dissected, Open means the price at which the asset was lastly traded at Opentime, while Close means the price at which the asset was lastly traded at Closetime. High and Low are, respectively, the highest and lowest price that the asset was traded for in the interval between Opentime and Closetime. Lastly, Volume is the quantity of the asset that was traded in the mentioned interval.

Tradingview candlestick OHLCV view.

The candlestick format is the most common form of price visualization in specialized software nowadays, but it wasn’t always the case. Candlestick charts were developed in the 18th century in Japan by a rice trader called Munehisa Homma. In the western world, they weren’t used until 1991, with the arrival of a book by Steve Nison called “Japanese Candlestick Charting Techniques”. The body of the candle indicates Open and Close levels, while the shadow (little vertical line) of each candle indicates High and Low levels.

There is a plethora of options when it comes to data visualization. Candlestick charts aren’t the only option, just the most common one, along with bar charts (Both give the literal same type of information). This two methods can be built from the OHLCV data, but this isn’t the only valid formula for studying the markets. Depending on the data that one finds valuable for its analysis, the visualization methods will vary.


We’ve covered how order books work, analysed their internal structure and the main types of orders that act in the process of discovering price. As blockchains have implemented decentralized exchanges, an overview of bonding curves and how their formula generates a dynamic deterministic pricing for the assets is also covered.

When studying transactions in the Ethereum blockchain, some of them are broadcasted to the network in order to trade assets. Assets are usually represented by smart contracts, and swapped using decentralized exchanges. If looking enough, one will find some cases of multi-hop swaps. It’s the case of any transaction that perform multiple swaps in different liquidity pools, as it was shown with the image of a multi-hop swap regarding Sushiswap and Uniswap. Sometimes, swaps are broadcasted with the intention of performing some arbitrage trades, and there is a lot to talk about them that may be covered in other article.

Trades data is one of the purest pieces of information about prices, as it could be used as base when calculating OHLCV data. When this last kind of data is available, one could easily generate a candlestick or a bar chart, as well as any other of their variations. Nonetheless, it doesn’t show all the possible information about prices, and some specialized agents could prefer other types of data visualization.

By Alfonso Camblor García