On data and certainty

Print Friendly, PDF & Email

Being certain is a lovely thing. Despite what many would allege about the poor finality of proof of work, the relative certainty it provides is part of the appeal. Once that inbound transaction is buried six confirmations deep, it’s almost certainly yours. Of course, even more certainty is achievable with an in-person cash transaction. But you can’t send those over the internet.

It’s no surprise that users of Bitcoin and other cryptocurrencies have a near-insatiable hunger for data. The tools that extract meaningful data from block headers and node information and convert them into a useful format are vital. Whether it’s coin.dance, fork.lol, flippening.watch, or segwit.party, this industry has always benefited from useful interpretations of non-human-readable data broadcast on various blockchains. Other services we use virtually every day are p2sh.info, blockchain.info, etherscan.io, ethgasstation.info, and jochen’s mempool stats.

These aren’t just for analysts – they are for normal businesses using cryptocurrency or typical users looking to set the right fee. Some of these interpretations have constraints, of course, and occasionally the data offered by free services is poor or misinterpreted. Since we run a website in that same category – we want to empower users to make good decisions – we feel that we have a duty to make the shortcomings of our data starkly clear.

The previous installment in the series of user education can be found here. We describe the difficulties inherent to estimating on-chain transaction volume, and the constraints of our own estimate. For UTXO chains, they are suboptimal. For account-based systems (Ethereum and its derivatives), we have a better idea, albeit not a perfect one.

This post is a brief attempt to qualify our data and to give you an idea of which figures we have a high degree of confidence in, and which ones we don’t.

Let’s jump straight to the CSV files available at Data Downloads. Virtually everything found on our charts page is built out of these pieces of data. It’s just a matter of knowing how to put them together. This is what the last two weeks of data look like for Bitcoin:

Let’s go through these columns carefully. First, you have the date. Not much to talk about there. We use the gregorian calendar. Daily closes for our price quotes occur at 00:00 UTC (that’s 7 pm EST).

Next you have txVolume(USD). That’s what we’re talking about when we say “on-chain transaction volume.” Simply put, it’s a broad and largely unadjusted measure of the total value of outputs on the blockchain, on a given day. This is an answer to the question “approximately how much value, denominated in USD, circulates on the Bitcoin blockchain a day?”

That said, on-chain transaction volume in practice is a very hard thing to estimate properly. We discuss that at length in this post. Please read it! We don’t want to give anyone a false degree of certainty in the number. We are currently working through various methods to improve the estimate, and maybe get closer to blockchain.info’s adjusted figure – but that will take time. So bear with us.

The third column is txCount. That refers to the number of transactions happening on the public blockchain a day. Be aware that for low-fee blockchains, it’s really easy to fabricate a whole bunch of transactions. So this isn’t necessarily that reliable! Additionally, UTXO networks like Bitcoin can batch a whole bunch of transactions into one, so txCount underestimates those ones. You have to therefore be careful comparing the number of transactions on Ethereum with Bitcoin; by its very nature, Bitcoin typically has more transactions than that datapoint suggests. Naively comparing the UTXO to account based systems by transaction count is like watching motorway traffic and comparing the number of buses versus motorcycles to guess at how many people are making trips. Maybe there are the same number of buses and motorcycles – but each bus might have 50 people inside of it.

Here’s a very typical example of a batched Bitcoin transaction. One sender, lots of receivers.

This is fairly typical for Bitcoin. Maybe it’s a mining pool paying out, or an exchange paying multiple users at once. Batching transactions saves space, and is more cost-efficient, so it’s encouraged. It also means that just counting transactions for Bitcoin (or other UTXO chains) isn’t likely to yield a reliable estimate of how many actual transactions are occurring. The useful site outputs.today popped up recently to make this point. Of course, outputs.today might be including change outputs (we’re not sure if they are or not), which aren’t meaningful transfers. But even if you conservatively assumed that half of all bitcoin outputs tallied on outputs.today are change outputs, you would still have a higher number of outputs than just the raw number of transactions. So please be aware of the fact that Bitcoin transactions have the flexibility of email (one can send to many), constrained only by the blocksize and the willingness of miners to include large transactions.

Next, marketcap(USD). This is of course the unit price multiplied by the number of units in circulation. There has been quite a bit of controversy over this indicator. We still like this post from the Sia guys on the topic. Marketcap or network value is definitely flawed. It becomes less tethered to reality the smaller the float is. Float means the ratio of actual circulating units to the total number of units. Ripple, for instance, has a fairly small float, so one should probably be skeptical of its “market cap.” OnChainFx is doing a lot of good work on the issue.

Price is price. Not much to say about that one. We get it from CoinMarketCap, with all the caveats that entails. Be advised – it’s the opening price.

exchangevolume(USD) is, as you might expect, the dollar value of the volume at exchanges like GDAX and Bitfinex. We get this data to Coinmarketcap, who have a bit of a conflicted history with the figure, having deleted and re-added Korean exchange figures. It doesn’t include data on OTC exchanges, which is a meaningful portion of all global exchange. Remember that 0-fee exchange volume should be taken with a grain of salt.

Next, generatedCoinsThis refers to the number of new coins that have been brought into existence on that day. We count up the actual number of newly-minted coins, rather than using the stated inflation figures (i.e. for bitcoin you should expect 12.5 per block, every ten minutes, giving you 12.5*6*24 = 1800 coins per day). You can see that we’ve been exceeding 1800 coins per day recently – this is due to lots of new hashpower which is coming online and making blocks come every 8 or 9 minutes than 10. In practice, since hashpower is continually added to the system, Bitcoin inflation progresses slightly faster than its theoretical rate. This is also why our figures differ from those of other websites – we count up the actual number of new coins rather than just assuming the official inflation rates are correct.

Lastly, Fees. Fees in our data are based on the native currency, not USD. So on January 28th, fees totaled 168.25 BTC. That’s about $1.88m. This has been a source of confusion for many, so again – fees are counted in the native currency. You have to multiply by unit price to obtain the USD value of fees. Fees are interesting because they can’t really be faked – either you want to use the blockchain, and you’ll pay to do it, or you choose not to.

So to conclude, on-chain volume and transaction count can both be faked and can be tricky to estimate. Exchange volume must be viewed fairly skeptically. Market cap has a whole host of methodological issues. Generated coins and fees, however, are much more concrete.

We hope this helps you assess our data. Do not take it as gospel – use it only with the appropriate dose of skepticism.


  1. Hi there,

    thank you for providing the dataset. It is very useful for my research.

    Here I have a question:

    In the file lsk.csv, in the beginning period, the marketcap is zero but the exchange volume is not. How can this happen? Is it an error in the construction?

    All the best, Cedric

    1. I might understand it after I read your post on the marketcap. Maybe there is no circulation in that moment.


  2. “Once that inbound transaction is buried six confirmations deep, it’s almost certainly yours. Of course, even more certainty is achievable with an in-person cash transaction.”

    The cash could be a high-grade counterfeit, and the transaction effectively “reversed” when the seller attempts to deposit the cash at the bank. That probability, however small, ought to be compared against the probability of an X block deep chain reorg.

    1. Hello Caesar, please take a look at “Data sources and methodology” section at the bottom of Data Downloads page.

  3. Hi,

    thanks for the data!

    Here is a question. You say

    “Daily closes for our price quotes occur at 00:00 UTC (that’s 7 pm EST).”
    “Price is price. Not much to say about that one. We get it from CoinMarketCap, with all the
    caveats that entails. Be advised – it’s the opening price.”

    So is the price in the row – 1/13/18 the opening or the closing price on 1/13/18? According to your description, it should be the opening price on 1/13/18, namely, 1/13/18 00:00 UTC (the closing time) minus 24 hours, not the opening price on 1/14/18. Am I correct?

    But this way of saving the data is not very usual. Maybe I am wrong. But thanks a lot if you can clarify!

    1. Hi Mat, sorry for the confusion. To clarify, the price quoted on 1/13/18 is the opening price. Ignore the “daily closes…” language. We will fix that.

    1. yes. block size is the number of bytes broadcast in final blocks in that 24h hour period. (divide by 1m to get daily throughput in MB). block count is simply how many blocks there were on that chain in the 24h period. you can divide block size by block count to get average block size.

  4. Hi

    Could you explain what the average difficulty variable represents (I have the BTC, ETH, and LTC downloads)

    Thanks for your help

    1. “Difficulty” is a variable that PoW blockchains use to determine how hard it is to solve the hash function to find a new block. It is a proxy for hash power. For bitcoin it changes every 2016 blocks (~2 weeks) – the idea is to keep the average block time roughly constant, even in the face of variable hash power. See more: https://en.bitcoin.it/wiki/Difficulty

      Practically, you can use it as a smoothed version of hash power. They are roughly the same.

  5. Greetings,

    First of all, thank you for providing all this data for free and easily accessible.

    While playing with the CSV files I’ve noticed that the data lines (i.e., lines 2 and onwards) have an extra comma at the end; the header (line 1) does not. This somehow confuses some CSV loaders, such as (Python’s) pandas read_csv, as they interpret it as an extra column with an empty value.

    I understand this is probably a detail and it’s not hard to use some string replacement function to get rid of the extra comma. However, if it’s not too much trouble, it would be nice if either the extra comma would be removed from the data lines or the header line included the extra comma (so at least we would have an empty by consistent column).

    Thanks again for the great work putting this site together, gathering and explaining the data.

  6. Hello,

    first – thank you very much for sharing the data, it is extremely useful. A question – when I look at your Ripple historical data, it ends on 13 Aug, 2018 – obviously smthg went wrong here, can you please check if/when possible?

    Thanks and best regards,


    1. Hi Michael, we are working on fixing the Ripple data. Their nodes are difficult to wrangle. Thank you for letting us know, and check back soon.

  7. The Ripple is only updated to 11/19.
    Could you update the Ripple data?

    I really hope you can increase other crypto currencies like BNB…
    Thanks for your huge effort about crypto market data.

  8. Hi, Thank you very much for the data. I am about to start my dissertation on cryptocurrencies and will be using this data extensively, so thank you very much for providing it.
    I was wondering if you had data on current circulating supply of the major currencies at the end of each day. Or would i be able to calculate this by summing up the generated coins?

    Thanks again.

  9. hi
    congratulations for your great work. i’m about learning. first i was trying to understand the cryptos. than i got daily values from different sources (bitinfochart, coinapi and yours). trying to understand the real world i analyzed the data. first i learned there is one source the blockchain but not only one truth about daily changes – interesting isn’t it? i’m not talking about price … there are a lot of truths of course.
    internaly raised the questions … (not critical)
    – why is the info not organized data into 2 buckets: related to coins and related to exchanges?
    — coins would be: supplied, transactions, adjusted transactions, fees, generated, burned and so on
    — exchange would be: price open, low, high, close per pair and exchange (you might have a glimps @coinapi)
    – documented is a price – pointing to CoinMarketCap who are providing data for 17+ marketplaces. can you specify the marketplaces your opening price is based on?
    – tables are verry different in content …
    — eos does not show supplied, active addresses and no fees; eventhough eos does work with fees. all coins are supplied – yes. but far not all are hold by public addresses. blackone controlls most of them.
    — ada does not show supplied
    – theres a subject others do talk about – inflation – deflation. where can i get your thougths?

    have fun

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.