Pages

Sunday 5 July 2015

FAQ: Bitcoin mining and the luck statistic

1st July 2015

0. Introduction
I have a few posts that are hit by multiple readers every day, and some of these I knew would be references for years to come. This post is one of those.

To be honest, I should have written this a couple of years ago. I've posted the information in various parts of bitcointalk, but I keep losing the links. So when I noticed the most recent flare up of "this is a bad luck pool, I'm leaving for a good luck pool" panic I decided I'd better put aside other projects I was working on (Block attribution FAQ will be finished soonish) and post this.

My aim is for any brand new miner to be able determine just how unlikely any run of bad luck is, and so reduce the overall level of panic amongst minersMining panic has been exacerbated by reports of accidental block withholding attacks, and a stratum vulnerability. Wouldn't you prefer to know if your panic was actually warranted? 

Since this FAQ will be used as a reference for years to come I'll update it from time to time,  so if you think I'm missing something, or I could explain some point better, let me know in the comments.

1. Gambler's fallacy
For miners who have been around for more than a year or two seen good and bad luck (unless they mine at a "Pay per share" pool, in which case they are not subject to luck at all) and know that it will even out in the long term. However, every new miner striking a run of bad luck will flail around, looking to escape to another pool that is not having bad luck. This sort of response to random events can be thought of as a type of gambler's fallacy

2. Bad luck lasts longer
Another reason that makes us mis-judge mining luck is that when we mine, we mostly experience bad luck. In fact if you go to the trouble of working it out, your hours of mining will be about one-quarter good luck and three quarters bad luck. Why? Bad luck takes longer, good luck rounds take much less time. 

3. Assessing luck over time instead of blocks
Another mistake made by novice miners is to assume that the extremes of luck will be the same for all pool over any time frame. This is wrong for two related reasons:
  1. The more blocks are solved the closer luck approaches 100%
  2. Because the timeframe for luck to to approach 100% varies depending on number of blocks solved, comparing various pools' luck over the same time period is invalid. Instead we need to compare luck over similar number of blocks.
4. The luck statistic, the Erlang distribution, PDFs and CDFs
We'll start getting a little mathy here, so if you're feeling fragile just skip to the end of the post where I have a nice little "luck reference chart".

I'll try to avoid terms like "variance" and "median" and "maths" in order to not scare away too many readers, but we do need a definition:
Luck = mean(expected shares per round / actual shares per round)
 Luck statistic = mean(actual shares per round / expected shares per round) 
i.e. Luck = 1/Luck statistic
I would much rather just refer to the 'Luck statistic' as luck (and did so in many previous posts), but due to our psychological preference to assign luck a scale where bigger is better, we need both measures - "Luck" as a shorthand for "How much am I earning as a percent of what I expect to earn", and the "Luck" statistic. Just keep in mind the larger the 'luck' statistic, the worse the 'luck'.

The luck statistic is negative binomially distributed, but can be very closely approximated by a known and well understood distribution ( Erlang distribution ) which makes calculating probabilities simpler. 

The approximation becomes more accurate as difficulty increases - think of Euler's (1 + 1/n)^n approximation to e as the comparison of an exponentially distributed random variable (Erlang distribution shape parameter = 1) and a geometrically distributed random variable (Negative binomial distribution, size parameter = 1, probability = 1/n). In case you're worried about the approximation leading to significant error, at current difficulty you'll won't see a probability error greater than 0.0000000001.

Visualising the Erlang distribution:






Both plots illustrate:

  • The luck statistic tends closer to 1.0 as the number of blocks over which the statistic is averaged increases
  • Extremes of luck are more likely when the luck statistic is averaged over fewer blocks.
5. Managing income variance
Luck averaged over more blocks means fewer extremes, so more blocks in less time means as a miner you will experience less variation in payout - but also means that you'll be increasing the size of pools that are already large.

You can avoid this by adjusting your timescale expectations - try to focus on weekly income, or income per retarget and you'll be less affected by income variations. Wait about one hundred blocks and income will be around +/- 20% of expected.

Your other option is to mine at a pool that has a pay per share (PPS) reward method, but this has a couple of downsides. The first is that since the pool is smoothing out the income variations for you, if they don't manage that risk properly they could bankrupt themselves, and leaving you with lost income. The other problem is that since PPS is risky not many pools want to provide it so you won't have many options about where you can mine.

6. How can you calculate the CDF probability yourself? 
If you want to manage your expectations without using a PPS pool you need to know what to expect. Not just the reward per share but the typical range of values you might encounter in some time frame. So, how can you calculate the CDF probability yourself? If you have some experience with statistics or coding knowledge can use R or mathematica or even python, but you can also use the Wolfram Alpha website. By entering the luck statistic and the number of blocks over which the statistic was averaged, you get the lower tail probability of that statistic occurring.

CDF[ErlangDistribution[nblocks, nblocks], luck statistic]

For example, if the luck statistic was 1.1 over one hundred blocks is that quite unlucky or just a little unlucky? Enter: 

CDF[ErlangDistribution[100, 100], 1.1]
The result is 0.84, so for 84 times out of one hundred re-runs of one blocks, we'd see luckier blocks. Not that unlucky - 1 in every six re-runs would be unluckier. 

6. How can you calculate the probable luck outcomes yourself?
Rather than assess how lucky or unlucky your pool has been, planning requires you to estimate how unlucky is could be in future. Let's say you plan to be able to manage a monthly worst case of 0.999 (one one in a thousand re-runs of the months blocks would be worse), and your expect your pool to solve around 50 blocks in that time.

quantile(ErlangDistribution[50, 50], 0.999)
This results in a luck statistic of ~1.495, or a luck of 1/1.495 = 66.9%


7. I need something easier. Or less statisticky, anyway.
OK, I hear you. My fun != your fun. This chart gives you the expected luck percentage (and it's all bad luck) for bad luck with a 1/3 chance of that luck or worse occurring (not very unlucky) to bad luck with a 1/10000 chance of that luck or worse occurring (really quite unlucky). Use it to either plan for the future or get an idea of how lucky you've been.

For example, my pool solves ten blocks at a luck of 80%, is that really bad? Not really. It'll happen around 20% of the time (1/5 chance of that luck or worse occurring). Maybe I just want to make sure I can cope with a 1/thousand bad luck run of five hundred blocks (~67.5%).







I suppose you'll want a handy table? Use the first table to estimate worst case luck scenarios - chance of bad luck (or worse) on the side, and just read off the percentage luck. Use the second table to read off lower tail probabilities for even luck percentages.






8. Summary

If I have to boil this information down to its important elements:
  • Variance in income reduces as a function of number of blocks solved.
  • Variance in income is not a function of time.
  • Learn how to plan for bad luck, and to check that your pool's luck is not impossibly bad.
I hope I've made this a bit less tricky for you - post in the comments if I haven't been clear or need to add in more information. I expect this post to be a long term work in progress.




organofcorti.blogspot.com is a reader supported blog:
1QC2KE4GZ4SZ8AnpwVT483D2E97SLHTGCG

Created using R and various packages, especially dplyrdata.tableggplot2 and forecast.

Recommended reading:


Find a typo or spelling error? Email me with the details at organofcorti@organofcorti.org and if you're the first to email me I'll pay you 0.01 btc per ten errors.

Please refer to the most recent blog post for current rates or rule changes.

I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.
Exceptions:
  • Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
  • Errors in chart texts: Since I can't fix the chart texts (since I don't keep the data that generated them) I can't pay for them. Still, they would be nice to know about!
I write in British English.








1 comment:

Comments are switched off until the current spam storm ends.