Lecture Seven: Beta-Binomial Inference¶

Beta Densities¶

Let us start today by noting some general properties of Beta densities. For two positive real numbers $a$ and $b$, the $\text{Beta}(a, b)$ density is given by: \begin{align*} f_{\text{Beta}(a, b)}(u) = \frac{u^{a-1} (1-u)^{b-1} I\{0 \leq u \leq 1\}}{\int_0^1 u^{a-1} (1-u)^{b-1} du} \end{align*} The numbers $a$ and $b$ control the shape of the Beta density. The following facts on the $\text{Beta}(a, b)$ distribution will be quite useful for rest of today's lecture:

  1. The simplest Beta density is the uniform density corresponding to $a = 1$ and $b = 1$ which is flat on $[0, 1]$.
  2. The mean of the Beta distribution is $\frac{a}{a+b}$. For example, when $a = 10$ and $b = 30$, the mean equals $0.25$. On the other hand, when $a = 30$ and $b = 10$, the mean equals $0.75$.
  3. The variance of the Beta distribution is $\frac{ab}{(a+b)^2(a+b+1)} = \frac{a}{a+b} \frac{b}{a+b} \frac{1}{a+b+1}$. Thus when $a+b$ is large, the variance will be small and the Beta density will look quite skinny around its mean.

You can learn a lot more about Beta Densities from its wikipedia page.

In [1]:
#Plotting Beta densities:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta

a = 10
b = 100000
#x = np.linspace(0, 1, 1000)
x = np.arange(0, 8e-4, 1e-6)
y = beta.pdf(x, a, b)
plt.plot(x, y, label = f"Beta({a}, {b})")
plt.title(f'Beta Density for a = {a} and b = {b}')
plt.xlabel("x")
plt.ylabel("Density")
plt.legend()
plt.show()
No description has been provided for this image

Recap of Last Lecture¶

In the last lecture, we focused on the following problem: estimate the quality of a product from reviews. We used $\theta$ to denote the quality of the product. The reviews are summarized by the number of positive reviews ($Pos$) and the number of negative reviews ($Neg$); the total number of reviews is $Tot = Pos + Neg$. The likelihood that we worked with was the Bernoulli likelihood given by: \begin{align*} \mathbb{P} \{\text{reviews} \mid \theta = u \} = u^{Pos} (1-u)^{Neg}. \end{align*} The naive estimate for $\theta$ is the proportion $\frac{Pos}{Tot} = \frac{Pos}{Pos + Neg}$. This may be reasonable when the number of reviews is large but quite unreasonable when the number of reviews is small. For example, when $Pos = 3$, $Neg = 0$ (so that $Tot = 3$), the proportion is 1 which is clearly not a good estimate of $\theta$ (we do not want to infer that the microwave is perfect based on only three reviews). In the sequel, the names "naive estimate", "proportion", "naive proportion", "MLE", "Frequentist Estimate" all refer to the same thing.

The Bayesian approach uses probability and supplements the likelihood with a prior on $\theta$. The prior reflects the uncertainty in $\theta$ before observing review data. In the last class, we worked with the uniform prior: $\theta \sim \text{Uniform}[0, 1]$. We saw that the posterior is given by $\text{Beta}(1 + Pos, 1 + Neg)$. In other words, \begin{align*} \text{uniform prior} ~~ and ~~ \text{Bernoulli likelihood} ~~ \implies ~~ \text{Beta}(1 + Pos, 1 + Neg) \text{ posterior} \end{align*}

More General Priors¶

Because the uniform prior is also $\text{Beta}(1, 1)$, we can rewrite the above as: \begin{align*} \text{Beta}(1, 1) \text{ prior} ~~ + ~~ \text{Bernoulli likelihood} ~~ \implies ~~ \text{Beta}(1 + Pos, 1 + Neg) \text{ posterior} \end{align*} The above fact actually works if we replace 1 and 1 by any positive $a$ and $b$. In other words, \begin{align*} \text{Beta}(a, b) \text{ prior} ~~ + ~~ \text{Bernoulli likelihood} ~~ \implies ~~ \text{Beta}(a + Pos, b + Neg) \text{ posterior} \end{align*} We can take advantage of this and work with non-uniform priors. For example, suppose we believe that products in this specific marketplace are never of quality below 0.8 and typically of quality 0.9, we can incorporate this as a $\text{Beta}(a, b)$ prior as follows. We set the mean $\frac{a}{a+b}$ to be equal to 0.9 and the standard deviation to be equal to $0.05$ (so that two standard deviations cover the gap between the typical value 0.9 and the realistic lowest value $0.8$): \begin{align*} \frac{a}{a+b} = 0.9 ~~ \text{ and } ~~ \frac{a}{a+b} \frac{b}{a+b} \frac{1}{a+b+1} = (0.05)^2 = 0.0025 \end{align*} The first equation above gives $a = 9 b$ and the second equation gives \begin{align*} \frac{0.9 \times 0.1}{a+b+1} = 0.0025 \implies a+b = 35. \end{align*} Combining with $a = 9b$, we deduce $a = 31.5$ and $b = 3.5$. This prior and the corresponding posterior for specific values of $Pos$ and $Neg$ can be plotted as follows.

In [2]:
# Define the range of values
x = np.linspace(0, 1, 1000)
a = 31.5
b = 3.5
pos = 3
neg = 0
pdf1 = beta.pdf(x, a, b) #prior density
pdf2 = beta.pdf(x, a + pos, b + neg) #posterior density

# Plot the densities
plt.figure(figsize=(10, 6))
plt.plot(x, pdf1, label='Prior', color='blue')
plt.plot(x, pdf2, label='Posterior', color='red')
plt.title('Prior and Posterior')
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()
#Repeat these plots for other values of pos and neg such as pos = 19, neg = 1 and also for pos = 1, neg = 19
No description has been provided for this image

Comparison between Naive Estimates (proportions) and Bayesian Estimates¶

Recall again that \begin{align*} \text{naive estimate} = \text{proportion} = \frac{Pos}{Tot} = \frac{Pos}{Pos+Neg} \end{align*} The Bayesian does not use the naive estimate but instead summarizes the posterior uncertainty in the form of the $\text{Beta}(a + Pos, b + Neg)$ density. One commonly used Bayesian Estimate is the posterior mean which is \begin{align*} \text{Bayesian estimate} = \frac{a + Pos}{a + b + Tot} = \frac{a + Pos}{a + b + Pos + Neg} \end{align*} These posterior means can be much better summaries for the quality of each microwave compared to the naive proportion. For example, when $Pos = 3$ and $Neg = 0$ the posterior mean becomes $\frac{a+3}{a+b+3}$ which, for the uniform prior ($a = b = 1$), becomes $\frac{4}{5} = 0.8$. $0.8$ is probably a more reasonable estimate for $\theta$ in this case compared to the the naive proportion which equals 1.

The following is a very insightful equation relating the naive estimate and the Bayesian estimate (this relation also involves the prior mean $\frac{a}{a+b}$): \begin{align*} \frac{a + Pos}{a + b + Tot} = \left(\frac{a+b}{a+b+Tot} \right) \frac{a}{a+b} + \left(\frac{Tot}{a+b+Tot} \right) \frac{Pos}{Tot} \end{align*} which is equivalent to \begin{align*} \text{Bayesian Estimate} = \left(\frac{a+b}{a+b+Tot} \right) \text{Prior Mean} + \left(\frac{Tot}{a+b+Tot} \right) \text{Naive Estimate} \end{align*} Letting \begin{align*} w = \frac{a+b}{a+b+Tot}, \end{align*} we can rewrite the equation as \begin{align*} \text{Bayesian Estimate} = w \times \text{Prior Mean} + \left(1- w\right) \times \text{Naive Estimate} \end{align*} This leads to following observations:

  1. The Bayesian estimate is a weighted average of the prior mean and the naive estimate
  2. If the prior mean and the naive estimate are close to each other, then the Bayesian estimate will also be close to the naive estimate (as well as the prior mean)
  3. If the prior mean and the naive estimate are different, then the Bayesian estimate will push the naive estimate towards the prior mean. The amount by which the Bayesian estimate will be pushed towards the prior mean will depend on the size of the weight $w = \frac{a+b}{a+b+Tot}$. If this weight is close to 1, then the Bayesian estimate will be very close to the prior mean. On the other hand, if this weight is close to 0, the Bayesian estimate will remain close to the naive estimate.
  4. The size of $Tot$ (total number of observations or, in the specific product quality estimation example, the total nunber of reviews) relative to $a+b$ controls the size of the weight $w$. If $Tot$ is large relative to $a+b$, then the weight is close to 0.

Kidney Cancer Dataset¶

Now we shall study a real dataset where naive proportions can be uninformative or even misleading. This is an example on "Bayesian Disease Mapping", and is taken from Section 2.7 on estimation of kidney cancer rates of the book "Bayesian Data Analysis Edition 3" by Gelman et al. Some of the code is taken from a blog post by Robin Ryder (google this).

In [3]:
#Read the kidney cancer dataset
import pandas as pd

d_full = pd.read_csv("KidneyCancerClean.csv", skiprows=4)
display(d_full.head(15))
summary = d_full.describe(include = 'all')
display(summary)
Unnamed: 0 state Location fips dc dcV pop popV aadc aadcV ... good dc.2 dcV.2 pop.2 popV.2 aadc.2 aadcV.2 dcC.2 dcCV.2 good.2
0 108 ALABAMA Autauga County, Alabama 1001 2 VALID 61921 VALID 3.3 UNRELIABLE ... 1 1 VALID 64915 VALID 1.7 UNRELIABLE 1.5 UNRELIABLE 1
1 117 ALABAMA Baldwin County, Alabama 1003 7 VALID 170945 VALID 3.5 UNRELIABLE ... 1 15 VALID 195253 VALID 6.3 UNRELIABLE 7.7 UNRELIABLE 1
2 129 ALABAMA Barbour County, Alabama 1005 0 VALID 33316 VALID 0.0 UNRELIABLE ... 1 1 VALID 33987 VALID 2.6 UNRELIABLE 2.9 UNRELIABLE 1
3 199 ALABAMA Bibb County, Alabama 1007 0 VALID 30152 VALID 0.0 UNRELIABLE ... 1 1 VALID 31175 VALID 2.9 UNRELIABLE 3.2 UNRELIABLE 1
4 219 ALABAMA Blount County, Alabama 1009 3 VALID 88342 VALID 3.2 UNRELIABLE ... 1 5 VALID 91547 VALID 4.8 UNRELIABLE 5.5 UNRELIABLE 1
5 306 ALABAMA Bullock County, Alabama 1011 0 VALID 8313 VALID 0.0 UNRELIABLE ... 1 0 VALID 8197 VALID 0.0 UNRELIABLE 0.0 UNRELIABLE 1
6 318 ALABAMA Butler County, Alabama 1013 0 VALID 31963 VALID 0.0 UNRELIABLE ... 1 1 VALID 31722 VALID 2.8 UNRELIABLE 3.2 UNRELIABLE 1
7 343 ALABAMA Calhoun County, Alabama 1015 9 VALID 243105 VALID 4.0 UNRELIABLE ... 1 12 VALID 233021 VALID 5.4 UNRELIABLE 5.1 UNRELIABLE 1
8 439 ALABAMA Chambers County, Alabama 1017 7 VALID 59985 VALID 8.3 UNRELIABLE ... 1 0 VALID 57813 VALID 0.0 UNRELIABLE 0.0 UNRELIABLE 1
9 467 ALABAMA Cherokee County, Alabama 1019 0 VALID 43401 VALID 0.0 UNRELIABLE ... 1 0 VALID 43828 VALID 0.0 UNRELIABLE 0.0 UNRELIABLE 1
10 490 ALABAMA Chilton County, Alabama 1021 2 VALID 65792 VALID 3.1 UNRELIABLE ... 1 3 VALID 68837 VALID 3.8 UNRELIABLE 4.4 UNRELIABLE 1
11 496 ALABAMA Choctaw County, Alabama 1023 1 VALID 22741 VALID 3.7 UNRELIABLE ... 1 3 VALID 22441 VALID 11.4 UNRELIABLE 13.4 UNRELIABLE 1
12 527 ALABAMA Clarke County, Alabama 1025 1 VALID 38611 VALID 2.5 UNRELIABLE ... 1 3 VALID 37772 VALID 6.7 UNRELIABLE 7.9 UNRELIABLE 1
13 533 ALABAMA Clay County, Alabama 1027 3 VALID 27846 VALID 8.1 UNRELIABLE ... 1 3 VALID 26954 VALID 8.9 UNRELIABLE 11.1 UNRELIABLE 1
14 557 ALABAMA Cleburne County, Alabama 1029 2 VALID 29718 VALID 5.9 UNRELIABLE ... 1 1 VALID 29735 VALID 3.1 UNRELIABLE 3.4 UNRELIABLE 1

15 rows × 22 columns

Unnamed: 0 state Location fips dc dcV pop popV aadc aadcV ... good dc.2 dcV.2 pop.2 popV.2 aadc.2 aadcV.2 dcC.2 dcCV.2 good.2
count 3110.000000 3110 3110 3110.000000 3110.000000 3110 3.110000e+03 3110 3110.000000 3110 ... 3110.000000 3110.000000 3110 3.110000e+03 3110 3110.000000 3110 3109.000000 3109 3110.000000
unique NaN 50 3109 NaN NaN 1 NaN 1 NaN 2 ... NaN NaN 1 NaN 1 NaN 3 NaN 2 NaN
top NaN TEXAS Bedford County, Virginia NaN NaN VALID NaN VALID NaN UNRELIABLE ... NaN NaN VALID NaN VALID NaN UNRELIABLE NaN UNRELIABLE NaN
freq NaN 254 2 NaN NaN 3110 NaN 3110 NaN 2882 ... NaN NaN 3110 NaN 3110 NaN 2848 NaN 2848 NaN
mean 1555.500000 NaN NaN 30672.690675 7.527974 NaN 1.550087e+05 NaN 4.438392 NaN ... 0.985852 8.359164 NaN 1.606611e+05 NaN 4.834244 NaN 5.787359 NaN 0.985852
std 897.923995 NaN NaN 14980.587745 23.095402 NaN 4.722345e+05 NaN 3.501390 NaN ... 0.118120 23.765728 NaN 4.972750e+05 NaN 3.709540 NaN 5.030156 NaN 0.118120
min 1.000000 NaN NaN 1001.000000 0.000000 NaN 2.580000e+02 NaN 0.000000 NaN ... 0.000000 0.000000 NaN 2.670000e+02 NaN 0.000000 NaN 0.000000 NaN 0.000000
25% 778.250000 NaN NaN 19041.500000 1.000000 NaN 2.238250e+04 NaN 2.100000 NaN ... 1.000000 1.000000 NaN 2.213000e+04 NaN 2.525000 NaN 2.800000 NaN 1.000000
50% 1555.500000 NaN NaN 29210.000000 2.000000 NaN 4.762800e+04 NaN 4.300000 NaN ... 1.000000 3.000000 NaN 4.806950e+04 NaN 4.600000 NaN 5.200000 NaN 1.000000
75% 2332.750000 NaN NaN 46008.500000 6.000000 NaN 1.146262e+05 NaN 6.100000 NaN ... 1.000000 7.000000 NaN 1.179355e+05 NaN 6.600000 NaN 7.700000 NaN 1.000000
max 3110.000000 NaN NaN 56045.000000 647.000000 NaN 1.526640e+07 NaN 49.800000 NaN ... 1.000000 598.000000 NaN 1.660789e+07 NaN 44.600000 NaN 71.800000 NaN 1.000000

11 rows × 22 columns

This is county-level data on population and deaths due to kidney cancer. There are quite a few columns in the dataset but we shall only work with the four columns: dc, dc.2, pop and pop.2:

  1. The columns dc and dc.2 represent death counts due to kidney cancer in 1980-84 and 1985-89 respectively.
  2. The columns pop and pop.2 represent some measure of the average population in 1980-84 and 1985-89 respectively.

Let us create a smaller dataframe with only these columns.

In [4]:
d = d_full[['state', 'Location', 'fips', 'dc', 'pop', 'dc.2', 'pop.2']].copy()
display(d.head(20))
state Location fips dc pop dc.2 pop.2
0 ALABAMA Autauga County, Alabama 1001 2 61921 1 64915
1 ALABAMA Baldwin County, Alabama 1003 7 170945 15 195253
2 ALABAMA Barbour County, Alabama 1005 0 33316 1 33987
3 ALABAMA Bibb County, Alabama 1007 0 30152 1 31175
4 ALABAMA Blount County, Alabama 1009 3 88342 5 91547
5 ALABAMA Bullock County, Alabama 1011 0 8313 0 8197
6 ALABAMA Butler County, Alabama 1013 0 31963 1 31722
7 ALABAMA Calhoun County, Alabama 1015 9 243105 12 233021
8 ALABAMA Chambers County, Alabama 1017 7 59985 0 57813
9 ALABAMA Cherokee County, Alabama 1019 0 43401 0 43828
10 ALABAMA Chilton County, Alabama 1021 2 65792 3 68837
11 ALABAMA Choctaw County, Alabama 1023 1 22741 3 22441
12 ALABAMA Clarke County, Alabama 1025 1 38611 3 37772
13 ALABAMA Clay County, Alabama 1027 3 27846 3 26954
14 ALABAMA Cleburne County, Alabama 1029 2 29718 1 29735
15 ALABAMA Coffee County, Alabama 1031 3 80334 4 81882
16 ALABAMA Colbert County, Alabama 1033 3 108797 6 104971
17 ALABAMA Conecuh County, Alabama 1035 2 22009 2 20687
18 ALABAMA Coosa County, Alabama 1037 2 18003 1 17828
19 ALABAMA Covington County, Alabama 1039 4 76229 3 75990

The purpose of this data analysis is to estimate the kidney cancer death rates for each county and, more importantly, to flag the counties with high kidney cancer death rates. Before proceeding to the analysis, let us first combine the death counts for the two periods 80-84 and 85-89 to get one overall death count for each county. Further, let us take the average of pop and pop.2 to get a measure of the average population of each county in the period 80-89.

In [5]:
# Combine the death and population counts for the two periods 80-84 and 85-89
d['dct'] = d['dc'] + d['dc.2'] #dct stands for death count total
d['popm'] = (d['pop'] + d['pop.2']) / 2
display(d.head(15))
state Location fips dc pop dc.2 pop.2 dct popm
0 ALABAMA Autauga County, Alabama 1001 2 61921 1 64915 3 63418.0
1 ALABAMA Baldwin County, Alabama 1003 7 170945 15 195253 22 183099.0
2 ALABAMA Barbour County, Alabama 1005 0 33316 1 33987 1 33651.5
3 ALABAMA Bibb County, Alabama 1007 0 30152 1 31175 1 30663.5
4 ALABAMA Blount County, Alabama 1009 3 88342 5 91547 8 89944.5
5 ALABAMA Bullock County, Alabama 1011 0 8313 0 8197 0 8255.0
6 ALABAMA Butler County, Alabama 1013 0 31963 1 31722 1 31842.5
7 ALABAMA Calhoun County, Alabama 1015 9 243105 12 233021 21 238063.0
8 ALABAMA Chambers County, Alabama 1017 7 59985 0 57813 7 58899.0
9 ALABAMA Cherokee County, Alabama 1019 0 43401 0 43828 0 43614.5
10 ALABAMA Chilton County, Alabama 1021 2 65792 3 68837 5 67314.5
11 ALABAMA Choctaw County, Alabama 1023 1 22741 3 22441 4 22591.0
12 ALABAMA Clarke County, Alabama 1025 1 38611 3 37772 4 38191.5
13 ALABAMA Clay County, Alabama 1027 3 27846 3 26954 6 27400.0
14 ALABAMA Cleburne County, Alabama 1029 2 29718 1 29735 3 29726.5

A simple way to estimate the kidney cancer rate of each county is to use the proportion: \begin{align*} \text{Naive Estimate of Kidney Cancer Death Rate} = \frac{\text{Number of Kidney Cancer Deaths}}{\text{Population}} \end{align*} We can then rank all the counties according to their naive proportions and flag the counties for whom the naive proportion is high.

In [6]:
d['naiveproportion'] = d['dct'] / d['popm']
summary = d['naiveproportion'].describe()
print(summary)
#Histogram of naive proportion estimates:
import matplotlib.pyplot as plt
plt.hist(d['naiveproportion'], bins=300)
plt.xlabel('naiveproportion')
plt.ylabel('Frequency')
plt.title('Histogram of Naive Proportions')
plt.grid(True)
plt.show()
#Some observations on the histogram: 
#There is a spike at zero (corresponding to counties with no kidney cancer deaths)
#Most proportions are around 0.0001 but there are a few counties which have large proportions (more than 0.0004)
count    3110.000000
mean        0.000108
std         0.000068
min         0.000000
25%         0.000067
50%         0.000100
75%         0.000139
max         0.000691
Name: naiveproportion, dtype: float64
No description has been provided for this image
In [7]:
#Flagging Counties with High Kidney Cancer Death Proportions. 
#One way to do this is to get the counties with the 100 highest naive proportion:
threshold = d['naiveproportion'].nlargest(100).iloc[-1]
print(threshold)

d['cancerhigh'] = d['naiveproportion'] >= threshold
with pd.option_context('display.max_rows', None):
    display(d[d['cancerhigh'] >= True])
0.00025143527636927463
state Location fips dc pop dc.2 pop.2 dct popm naiveproportion cancerhigh
65 ALABAMA Wilcox County, Alabama 1131 2 11144 1 10727 3 10935.5 0.000274 True
148 ARKANSAS Sharp County, Arkansas 5135 4 34922 5 33806 9 34364.0 0.000262 True
149 ARKANSAS Stone County, Arkansas 5137 1 22387 5 23148 6 22767.5 0.000264 True
230 COLORADO Dolores County, Colorado 8033 1 4151 1 3797 2 3974.0 0.000503 True
473 GEORGIA Quitman County, Georgia 13239 0 2524 1 2557 1 2540.5 0.000394 True
485 GEORGIA Talbot County, Georgia 13263 1 5855 1 6008 2 5931.5 0.000337 True
495 GEORGIA Treutlen County, Georgia 13283 1 9944 3 9695 4 9819.5 0.000407 True
512 GEORGIA Wilkes County, Georgia 13317 3 14334 1 14223 4 14278.5 0.000280 True
535 IDAHO Clark County, Idaho 16033 1 2212 0 2176 1 2194.0 0.000456 True
549 IDAHO Lewis County, Idaho 16061 2 9705 1 8634 3 9169.5 0.000327 True
570 ILLINOIS Carroll County, Illinois 17015 5 44858 7 41446 12 43152.0 0.000278 True
575 ILLINOIS Clay County, Illinois 17025 8 37117 5 35052 13 36084.5 0.000360 True
595 ILLINOIS Hamilton County, Illinois 17065 2 21935 4 20838 6 21386.5 0.000281 True
793 IOWA Greene County, Iowa 19073 4 28283 4 25160 8 26721.5 0.000299 True
836 IOWA Ringgold County, Iowa 19159 2 14228 3 13013 5 13620.5 0.000367 True
839 IOWA Shelby County, Iowa 19165 4 36486 5 33239 9 34862.5 0.000258 True
844 IOWA Union County, Iowa 19175 7 32300 2 30043 9 31171.5 0.000289 True
854 IOWA Worth County, Iowa 19195 1 21765 5 20094 6 20929.5 0.000287 True
868 KANSAS Clark County, Kansas 20025 2 6192 0 5877 2 6034.5 0.000331 True
882 KANSAS Ellsworth County, Kansas 20053 2 15935 2 15846 4 15890.5 0.000252 True
892 KANSAS Greenwood County, Kansas 20073 3 20883 3 19326 6 20104.5 0.000298 True
935 KANSAS Rice County, Kansas 20159 5 27283 5 25668 10 26475.5 0.000378 True
951 KANSAS Sumner County, Kansas 20191 5 60141 12 60620 17 60380.5 0.000282 True
959 KANSAS Woodson County, Kansas 20207 0 11148 3 10227 3 10687.5 0.000281 True
1025 KENTUCKY Lee County, Kentucky 21129 2 18961 3 18278 5 18619.5 0.000269 True
1071 KENTUCKY Trigg County, Kentucky 21221 1 20577 5 21924 6 21250.5 0.000282 True
1199 MICHIGAN Alcona County, Michigan 26001 2 23953 5 24695 7 24324.0 0.000288 True
1246 MICHIGAN Luce County, Michigan 26095 2 14850 3 13410 5 14130.0 0.000354 True
1258 MICHIGAN Montmorency County, Michigan 26119 2 18711 3 20070 5 19390.5 0.000258 True
1270 MICHIGAN Roscommon County, Michigan 26143 5 41957 6 44959 11 43458.0 0.000253 True
1316 MINNESOTA Kittson County, Minnesota 27069 1 16438 3 14973 4 15705.5 0.000255 True
1372 MISSISSIPPI Attala County, Mississippi 28007 5 27969 2 27225 7 27597.0 0.000254 True
1376 MISSISSIPPI Carroll County, Mississippi 28015 1 12952 3 13422 4 13187.0 0.000303 True
1379 MISSISSIPPI Claiborne County, Mississippi 28021 0 7405 2 5844 2 6624.5 0.000302 True
1400 MISSISSIPPI Jefferson County, Mississippi 28063 0 3927 2 3305 2 3616.0 0.000553 True
1403 MISSISSIPPI Kemper County, Mississippi 28069 2 11047 1 10990 3 11018.5 0.000272 True
1417 MISSISSIPPI Montgomery County, Mississippi 28097 0 18294 6 17060 6 17677.0 0.000339 True
1428 MISSISSIPPI Quitman County, Mississippi 28119 1 12630 2 11163 3 11896.5 0.000252 True
1440 MISSISSIPPI Tunica County, Mississippi 28143 0 5776 2 5006 2 5391.0 0.000371 True
1457 MISSOURI Bates County, Missouri 29013 9 37094 2 36524 11 36809.0 0.000299 True
1487 MISSOURI Gasconade County, Missouri 29073 6 32412 7 33035 13 32723.5 0.000397 True
1512 MISSOURI Maries County, Missouri 29125 4 19261 1 19702 5 19481.5 0.000257 True
1519 MISSOURI Monroe County, Missouri 29137 2 22560 4 21524 6 22042.0 0.000272 True
1527 MISSOURI Ozark County, Missouri 29153 1 20050 5 20674 6 20362.0 0.000295 True
1553 MISSOURI Shelby County, Missouri 29205 1 18180 5 16697 6 17438.5 0.000344 True
1564 MISSOURI Worth County, Missouri 29227 1 7016 1 6153 2 6584.5 0.000304 True
1575 MONTANA Daniels County, Montana 30019 1 6864 1 6013 2 6438.5 0.000311 True
1583 MONTANA Glacier County, Montana 30035 2 14163 3 13601 5 13882.0 0.000360 True
1585 MONTANA Granite County, Montana 30039 0 6891 2 6791 2 6841.0 0.000292 True
1594 MONTANA McCone County, Montana 30055 1 7136 1 6258 2 6697.0 0.000299 True
1608 MONTANA Roosevelt County, Montana 30085 2 16259 3 14967 5 15613.0 0.000320 True
1620 MONTANA Wibaux County, Montana 30109 0 3897 1 3216 1 3556.5 0.000281 True
1630 NEBRASKA Brown County, Nebraska 31017 1 10514 2 9364 3 9939.0 0.000302 True
1635 NEBRASKA Cedar County, Nebraska 31027 4 27890 4 25872 8 26881.0 0.000298 True
1651 NEBRASKA Fillmore County, Nebraska 31059 2 18928 3 17637 5 18282.5 0.000273 True
1652 NEBRASKA Franklin County, Nebraska 31061 2 10530 2 9843 4 10186.5 0.000393 True
1681 NEBRASKA McPherson County, Nebraska 31117 0 1504 1 1392 1 1448.0 0.000691 True
1721 NEVADA Eureka County, Nevada 32011 1 3360 0 3508 1 3434.0 0.000291 True
1893 NORTH.CAROLINA Gates County, North Carolina 37073 0 10873 3 12298 3 11585.5 0.000259 True
1949 NORTH.CAROLINA Warren County, North Carolina 37185 3 14587 2 15707 5 15147.0 0.000330 True
1962 NORTH.DAKOTA Bowman County, North Dakota 38011 1 10659 3 9734 4 10196.5 0.000392 True
1970 NORTH.DAKOTA Eddy County, North Dakota 38027 0 8605 3 7780 3 8192.5 0.000366 True
1994 NORTH.DAKOTA Renville County, North Dakota 38075 1 9224 2 8504 3 8864.0 0.000338 True
2008 NORTH.DAKOTA Wells County, North Dakota 38103 2 17064 3 15698 5 16381.0 0.000305 True
2114 OKLAHOMA Cotton County, Oklahoma 40033 2 15669 2 14707 4 15188.0 0.000263 True
2119 OKLAHOMA Dewey County, Oklahoma 40043 5 14299 1 13397 6 13848.0 0.000433 True
2128 OKLAHOMA Haskell County, Oklahoma 40061 5 24288 1 23418 6 23853.0 0.000252 True
2135 OKLAHOMA Kiowa County, Oklahoma 40075 6 27698 3 24807 9 26252.5 0.000343 True
2146 OKLAHOMA McIntosh County, Oklahoma 40091 5 31374 3 31831 8 31602.5 0.000253 True
2185 OREGON Gilliam County, Oregon 41021 1 4928 1 4273 2 4600.5 0.000435 True
2209 OREGON Wheeler County, Oregon 41069 0 3700 1 3466 1 3583.0 0.000279 True
2283 SOUTH.CAROLINA Abbeville County, South Carolina 45001 5 37296 5 38019 10 37657.5 0.000266 True
2357 SOUTH.DAKOTA Hand County, South Dakota 46059 2 11996 1 11167 3 11581.5 0.000259 True
2358 SOUTH.DAKOTA Hanson County, South Dakota 46061 2 8416 1 7828 3 8122.0 0.000369 True
2362 SOUTH.DAKOTA Hyde County, South Dakota 46069 1 4584 1 4101 2 4342.5 0.000461 True
2365 SOUTH.DAKOTA Jones County, South Dakota 46075 1 3632 0 3626 1 3629.0 0.000276 True
2372 SOUTH.DAKOTA McCook County, South Dakota 46087 3 15383 2 14292 5 14837.5 0.000337 True
2382 SOUTH.DAKOTA Potter County, South Dakota 46107 2 8964 2 8407 4 8685.5 0.000461 True
2389 SOUTH.DAKOTA Tripp County, South Dakota 46123 3 16481 3 16100 6 16290.5 0.000368 True
2442 TENNESSEE Lake County, Tennessee 47095 2 15750 2 14613 4 15181.5 0.000263 True
2497 TEXAS Austin County, Texas 48015 8 40725 4 42896 12 41810.5 0.000287 True
2540 TEXAS Cottle County, Texas 48101 2 6337 0 5473 2 5905.0 0.000339 True
2552 TEXAS Dickens County, Texas 48125 2 7618 0 6507 2 7062.5 0.000283 True
2578 TEXAS Gonzales County, Texas 48177 5 39364 8 39825 13 39594.5 0.000328 True
2621 TEXAS Kent County, Texas 48263 0 2857 1 2606 1 2731.5 0.000366 True
2627 TEXAS Knox County, Texas 48275 3 12332 0 11531 3 11931.5 0.000251 True
2639 TEXAS Llano County, Texas 48299 3 25352 4 28130 7 26741.0 0.000262 True
2646 TEXAS Mason County, Texas 48319 2 8609 1 8443 3 8526.0 0.000352 True
2656 TEXAS Mills County, Texas 48333 2 10962 2 11028 4 10995.0 0.000364 True
2692 TEXAS San Augustine County, Texas 48405 2 15026 2 14385 4 14705.5 0.000272 True
2695 TEXAS San Saba County, Texas 48411 4 14253 0 13744 4 13998.5 0.000286 True
2698 TEXAS Shackelford County, Texas 48417 1 10038 2 9248 3 9643.0 0.000311 True
2728 TEXAS Washington County, Texas 48477 5 45594 7 49721 12 47657.5 0.000252 True
2802 VIRGINIA Brunswick County, Virginia 51025 3 16201 3 16430 6 16315.5 0.000368 True
2816 VIRGINIA Covington City, Virginia 51580 0 16971 5 14720 5 15845.5 0.000316 True
2824 VIRGINIA Essex County, Virginia 51057 3 12481 1 12625 4 12553.0 0.000319 True
2848 VIRGINIA Highland County, Virginia 51091 1 7157 2 6608 3 6882.5 0.000436 True
2968 WEST.VIRGINIA Doddridge County, West Virginia 54017 2 18806 3 17969 5 18387.5 0.000272 True
3033 WISCONSIN Florence County, Wisconsin 55037 2 10572 2 11035 4 10803.5 0.000370 True
3084 WISCONSIN Waushara County, Wisconsin 55137 5 45859 8 46906 13 46382.5 0.000280 True
In [9]:
#It will be cool to plot the high cancer counties (in terms of the naive proportions) on the US map
#This can be done in the following way.
import plotly.express as px
#First convert fips to standard form:
d['modfips'] = d['fips'].apply(lambda x: str(x).zfill(5))
#Plotting the counties with high cancer proportion:
fig = px.choropleth(
    d, 
    geojson="https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json", # This is a dataset containing county geometries
    locations='modfips', # Use the 'fips' column for county identification
    color='cancerhigh',
    hover_name='Location',  # Display the 'Location' column on hover
    scope="usa",
    title="Top 100 kidney cancer death rate counties (in terms of naive proportions)",
    color_continuous_scale="YlOrRd"
)
fig.update_geos(fitbounds="locations")
fig.update_layout(margin={"r":0,"t":40,"l":0,"b":0})
fig.show()

Question: Is this a good way of flagging the high kidney cancer rate counties? Should we now look into specific reasons (such as dietary patterns, pollution etc, lack of access to quality medical care etc.) why these counties have high rates?

It turns out that this analysis is misleading because of the use of naive proportions as estimates of kidney cancer death rates. In particular, the counties that we flagged are mostly counties with low populations. This can be understood as follows. The typical rate of kidney cancer is about 0.0001 as can be seen, for example, from the naive proportion estimates of counties with large populations.

In [10]:
sorted_populations = d['popm'].sort_values()
print(list(sorted_populations))
#Select counties with large populations
#By large population, let us take populations more than 300000
largecounties = d[d['popm'] >= 300000]
largecounties.shape[0]
#So there are 323 counties with population more than 300000
#Let us evaluate the naive proportions for these counties. 
summary = d[d['popm'] >= 300000]['naiveproportion'].describe()
display(summary)
#the mean and median are about 0.0001 which can therefore be viewed as a typical value of the kidney cancer rate
[262.5, 978.0, 1017.0, 1139.0, 1233.5, 1268.0, 1297.0, 1448.0, 1570.5, 1594.0, 1640.5, 1862.5, 1879.0, 1942.0, 1946.5, 2058.5, 2108.0, 2160.5, 2171.0, 2175.5, 2194.0, 2250.0, 2296.0, 2337.0, 2342.5, 2353.0, 2358.5, 2473.5, 2522.5, 2540.5, 2542.5, 2547.5, 2570.5, 2731.0, 2731.5, 2817.5, 2866.5, 3005.5, 3027.0, 3095.0, 3102.5, 3268.5, 3294.5, 3305.0, 3433.0, 3434.0, 3532.5, 3556.5, 3583.0, 3596.5, 3616.0, 3629.0, 3819.0, 3974.0, 4044.5, 4076.0, 4129.0, 4182.0, 4188.5, 4224.0, 4341.5, 4342.5, 4349.0, 4351.0, 4448.5, 4449.5, 4450.0, 4465.0, 4487.0, 4493.5, 4518.5, 4527.5, 4571.5, 4600.5, 4632.5, 4695.5, 4715.5, 4784.5, 4832.0, 4836.5, 4952.0, 4956.5, 5169.0, 5177.5, 5236.0, 5256.5, 5268.0, 5282.0, 5284.5, 5309.5, 5311.5, 5343.5, 5359.0, 5391.0, 5404.5, 5407.5, 5424.5, 5435.5, 5478.0, 5529.5, 5594.0, 5596.5, 5602.5, 5619.0, 5712.0, 5730.0, 5743.0, 5786.0, 5789.5, 5808.0, 5849.0, 5853.5, 5899.0, 5902.0, 5905.0, 5916.0, 5925.5, 5931.5, 6012.5, 6034.5, 6097.0, 6111.0, 6115.0, 6236.0, 6315.0, 6428.5, 6438.5, 6441.0, 6441.5, 6489.5, 6511.5, 6511.5, 6584.5, 6624.5, 6626.0, 6677.0, 6697.0, 6771.5, 6814.5, 6836.0, 6841.0, 6882.5, 7045.0, 7062.5, 7145.0, 7145.5, 7168.0, 7216.0, 7257.0, 7294.5, 7440.0, 7446.5, 7496.0, 7643.0, 7649.5, 7678.5, 7696.5, 7712.0, 7749.5, 7775.5, 7807.5, 7942.0, 7969.5, 8009.0, 8066.5, 8122.0, 8130.0, 8135.0, 8170.0, 8192.5, 8218.5, 8230.5, 8231.0, 8255.0, 8289.5, 8310.0, 8343.0, 8347.5, 8354.0, 8368.0, 8378.0, 8397.0, 8412.5, 8441.0, 8444.0, 8523.5, 8526.0, 8554.5, 8566.0, 8576.0, 8609.0, 8642.0, 8650.5, 8679.5, 8681.5, 8685.5, 8691.5, 8719.0, 8739.0, 8755.5, 8864.0, 8880.0, 8933.5, 8945.5, 8962.5, 9004.5, 9058.5, 9081.5, 9127.5, 9159.0, 9169.5, 9192.5, 9241.5, 9247.0, 9248.5, 9266.5, 9276.5, 9373.0, 9390.5, 9418.0, 9450.5, 9512.5, 9536.5, 9595.5, 9597.5, 9615.0, 9643.0, 9675.0, 9706.5, 9784.5, 9793.0, 9819.5, 9827.5, 9839.0, 9858.5, 9897.0, 9898.0, 9915.5, 9939.0, 9940.0, 9950.0, 9962.0, 10014.0, 10064.5, 10117.0, 10126.0, 10186.5, 10195.0, 10196.5, 10255.0, 10258.0, 10260.5, 10339.0, 10377.0, 10394.0, 10402.0, 10427.5, 10464.0, 10503.0, 10513.0, 10554.0, 10584.5, 10619.5, 10677.0, 10686.0, 10687.5, 10699.5, 10702.5, 10703.0, 10760.0, 10777.0, 10796.0, 10803.5, 10849.0, 10868.0, 10880.0, 10892.5, 10910.0, 10935.5, 10937.5, 10955.5, 10995.0, 11018.5, 11028.0, 11035.5, 11043.0, 11101.5, 11114.5, 11117.5, 11135.5, 11150.5, 11202.5, 11206.5, 11220.0, 11250.0, 11305.0, 11311.5, 11316.5, 11330.0, 11336.0, 11387.0, 11405.0, 11473.0, 11498.0, 11561.0, 11581.5, 11582.5, 11585.5, 11596.5, 11653.5, 11661.0, 11675.0, 11713.0, 11729.5, 11731.0, 11745.5, 11751.5, 11758.5, 11758.5, 11853.5, 11888.0, 11896.5, 11930.5, 11930.5, 11931.5, 11952.0, 11982.0, 12008.5, 12024.0, 12077.0, 12081.0, 12102.5, 12158.5, 12183.0, 12195.5, 12205.0, 12227.5, 12253.5, 12294.0, 12333.0, 12341.0, 12348.0, 12351.0, 12354.0, 12370.5, 12392.5, 12417.0, 12438.5, 12454.0, 12470.0, 12472.5, 12483.5, 12529.0, 12553.0, 12561.5, 12576.5, 12579.0, 12595.5, 12629.5, 12654.0, 12681.5, 12732.5, 12747.0, 12759.5, 12791.0, 12797.0, 12799.5, 12862.0, 12864.0, 12866.0, 12897.5, 12936.5, 12941.0, 12944.0, 12965.0, 13030.0, 13036.0, 13071.5, 13109.5, 13116.0, 13118.0, 13174.0, 13177.0, 13187.0, 13188.0, 13208.5, 13219.0, 13294.5, 13311.5, 13313.0, 13349.0, 13386.0, 13388.5, 13389.5, 13442.0, 13451.0, 13481.5, 13486.5, 13492.0, 13498.5, 13511.5, 13519.5, 13526.0, 13575.5, 13592.0, 13620.5, 13639.0, 13669.5, 13677.5, 13688.0, 13695.5, 13698.0, 13773.5, 13773.5, 13817.0, 13818.0, 13832.5, 13848.0, 13882.0, 13950.0, 13975.5, 13988.0, 13998.5, 14083.5, 14129.0, 14130.0, 14132.0, 14198.5, 14211.0, 14214.5, 14238.0, 14278.5, 14290.0, 14312.0, 14313.0, 14324.0, 14346.0, 14383.5, 14384.5, 14574.0, 14580.0, 14601.0, 14608.0, 14628.0, 14638.0, 14649.5, 14653.0, 14684.5, 14705.5, 14746.0, 14754.5, 14774.5, 14837.5, 14864.0, 14901.5, 14942.5, 14962.5, 14978.5, 14983.0, 15025.5, 15122.5, 15147.0, 15167.5, 15177.5, 15181.5, 15188.0, 15194.5, 15241.0, 15243.5, 15277.0, 15309.5, 15317.5, 15321.0, 15388.5, 15390.5, 15396.0, 15433.5, 15457.0, 15518.0, 15521.0, 15548.5, 15566.0, 15582.0, 15589.5, 15613.0, 15637.0, 15645.0, 15676.5, 15685.5, 15705.5, 15709.5, 15727.0, 15734.0, 15845.5, 15854.5, 15855.5, 15880.0, 15883.5, 15888.0, 15890.5, 16094.5, 16121.5, 16163.5, 16170.0, 16245.0, 16272.5, 16290.5, 16299.5, 16305.5, 16315.5, 16361.0, 16381.0, 16396.0, 16418.5, 16448.5, 16457.0, 16458.0, 16473.0, 16527.0, 16542.0, 16554.5, 16565.5, 16576.0, 16596.0, 16611.5, 16612.5, 16640.5, 16663.0, 16666.5, 16688.0, 16716.5, 16743.0, 16773.0, 16777.5, 16785.5, 16801.5, 16836.0, 16850.0, 16854.5, 16881.0, 16892.5, 16899.5, 16918.5, 16920.5, 16940.5, 17000.5, 17014.5, 17057.0, 17077.5, 17086.0, 17095.0, 17098.5, 17099.0, 17147.0, 17155.5, 17196.0, 17241.0, 17295.0, 17327.0, 17369.5, 17433.5, 17435.5, 17438.5, 17482.5, 17492.5, 17508.0, 17551.0, 17649.0, 17660.0, 17677.0, 17690.0, 17704.5, 17721.5, 17736.0, 17765.5, 17794.0, 17795.5, 17801.0, 17812.0, 17872.0, 17886.5, 17892.0, 17896.0, 17915.5, 17943.5, 17970.5, 18028.5, 18051.5, 18076.0, 18080.0, 18092.0, 18102.5, 18121.5, 18148.0, 18181.5, 18217.5, 18279.0, 18282.5, 18291.0, 18361.5, 18362.0, 18387.5, 18387.5, 18392.0, 18444.0, 18452.0, 18475.0, 18511.5, 18536.0, 18537.5, 18550.5, 18556.5, 18583.5, 18610.0, 18619.5, 18630.5, 18662.0, 18687.0, 18695.0, 18699.5, 18756.5, 18759.5, 18829.5, 18841.5, 18853.5, 18875.5, 18884.0, 18909.0, 18934.0, 18941.5, 18950.5, 19027.5, 19041.5, 19052.5, 19082.5, 19097.5, 19109.0, 19134.0, 19193.0, 19196.0, 19243.0, 19247.5, 19290.0, 19319.5, 19360.5, 19390.5, 19392.5, 19433.5, 19438.5, 19454.0, 19468.0, 19481.5, 19489.0, 19509.5, 19535.0, 19558.0, 19558.0, 19559.5, 19582.5, 19610.0, 19644.0, 19691.5, 19711.0, 19769.0, 19777.0, 19812.0, 19823.5, 19833.0, 19890.0, 19944.0, 19947.5, 19957.5, 19958.5, 20013.0, 20026.0, 20082.5, 20087.5, 20089.0, 20099.0, 20104.5, 20108.5, 20113.5, 20124.0, 20126.5, 20145.5, 20178.5, 20185.5, 20194.5, 20268.0, 20270.0, 20280.0, 20314.0, 20327.5, 20329.0, 20362.0, 20425.0, 20452.5, 20485.5, 20547.5, 20644.0, 20648.5, 20649.5, 20674.5, 20686.5, 20687.5, 20691.0, 20709.5, 20722.5, 20780.0, 20798.5, 20807.5, 20823.0, 20839.5, 20906.0, 20925.0, 20929.5, 20934.5, 20942.5, 20958.0, 20968.5, 20982.0, 21002.0, 21055.0, 21066.0, 21068.5, 21091.0, 21111.0, 21135.5, 21139.0, 21151.5, 21175.5, 21177.5, 21180.0, 21198.5, 21227.0, 21234.5, 21250.5, 21272.5, 21297.5, 21300.5, 21313.0, 21347.0, 21348.0, 21362.5, 21383.5, 21385.5, 21386.5, 21387.5, 21391.0, 21434.5, 21460.5, 21488.0, 21493.0, 21565.0, 21603.5, 21671.0, 21682.0, 21688.5, 21715.5, 21784.5, 21797.0, 21820.0, 21822.5, 21839.0, 21880.0, 21920.5, 21922.5, 21931.5, 21936.0, 21958.5, 22003.5, 22037.5, 22042.0, 22194.5, 22198.5, 22227.0, 22230.5, 22233.5, 22236.5, 22257.0, 22285.5, 22292.5, 22294.5, 22321.5, 22326.0, 22381.0, 22416.0, 22425.0, 22427.5, 22473.0, 22479.0, 22552.5, 22560.0, 22567.0, 22591.0, 22607.5, 22620.5, 22638.5, 22649.5, 22673.5, 22723.5, 22747.0, 22767.5, 22791.5, 22844.5, 22940.0, 23004.0, 23006.0, 23114.0, 23119.5, 23121.5, 23124.5, 23164.0, 23212.5, 23242.0, 23250.0, 23260.0, 23321.5, 23366.5, 23369.5, 23370.0, 23436.0, 23481.5, 23557.0, 23559.0, 23610.0, 23619.5, 23632.0, 23658.0, 23668.5, 23684.5, 23695.5, 23706.0, 23723.0, 23737.5, 23743.5, 23760.0, 23771.5, 23853.0, 23863.5, 23869.5, 23887.5, 23939.0, 23982.5, 23983.0, 24021.0, 24034.5, 24052.5, 24063.5, 24112.0, 24145.5, 24224.0, 24232.5, 24243.5, 24280.5, 24297.0, 24324.0, 24383.5, 24436.5, 24483.5, 24492.0, 24526.0, 24550.0, 24550.5, 24566.0, 24665.5, 24677.5, 24707.0, 24734.5, 24740.5, 24780.5, 24793.0, 24813.5, 24884.0, 24902.5, 24950.5, 24963.0, 25011.0, 25019.5, 25054.0, 25167.5, 25186.0, 25222.0, 25276.5, 25360.0, 25405.5, 25486.0, 25531.0, 25574.5, 25593.0, 25617.5, 25624.5, 25650.0, 25676.0, 25702.5, 25704.5, 25721.5, 25743.0, 25746.5, 25806.0, 25808.0, 25824.0, 25827.5, 25883.5, 25962.5, 26061.5, 26108.5, 26124.5, 26183.0, 26220.0, 26220.0, 26233.5, 26239.0, 26240.0, 26245.5, 26252.5, 26326.0, 26330.0, 26372.0, 26382.0, 26382.0, 26457.5, 26463.0, 26472.5, 26475.5, 26529.5, 26544.0, 26587.0, 26599.0, 26601.0, 26601.5, 26615.5, 26616.5, 26721.5, 26738.5, 26741.0, 26754.0, 26831.0, 26852.5, 26854.5, 26861.5, 26881.0, 26904.5, 26916.5, 26953.0, 26990.5, 26995.5, 27010.5, 27053.0, 27103.0, 27110.0, 27148.5, 27176.0, 27209.0, 27215.5, 27230.0, 27249.0, 27257.0, 27299.5, 27343.5, 27386.0, 27400.0, 27408.5, 27440.5, 27486.0, 27506.5, 27542.0, 27573.0, 27597.0, 27601.5, 27614.5, 27616.0, 27659.0, 27672.0, 27692.5, 27722.0, 27745.5, 27758.5, 27885.5, 27967.5, 27991.5, 28030.5, 28038.0, 28051.0, 28107.0, 28140.0, 28150.5, 28153.5, 28163.5, 28181.0, 28183.0, 28185.0, 28206.5, 28242.5, 28294.0, 28363.0, 28363.5, 28365.0, 28460.5, 28490.0, 28533.0, 28562.5, 28595.0, 28642.0, 28703.5, 28745.0, 28775.0, 28778.0, 28848.5, 28891.5, 28913.5, 29081.5, 29100.5, 29107.0, 29127.0, 29128.5, 29201.0, 29211.5, 29212.5, 29228.0, 29274.5, 29293.0, 29346.0, 29402.5, 29402.5, 29412.0, 29416.0, 29433.5, 29480.5, 29519.0, 29558.0, 29575.0, 29583.0, 29599.5, 29614.5, 29649.0, 29707.5, 29726.5, 29745.5, 29751.0, 29761.0, 29765.0, 29774.5, 29800.5, 29828.5, 29851.5, 29852.5, 29901.0, 29902.0, 29911.0, 29943.0, 29959.0, 30028.5, 30070.5, 30089.0, 30089.5, 30128.0, 30133.0, 30152.5, 30152.5, 30178.5, 30230.0, 30250.0, 30259.5, 30321.5, 30327.5, 30399.5, 30403.0, 30411.5, 30475.0, 30477.5, 30509.0, 30524.5, 30598.0, 30663.5, 30674.0, 30691.0, 30695.5, 30733.0, 30786.0, 30819.5, 30831.0, 30837.5, 30894.0, 30933.5, 31044.0, 31096.0, 31109.5, 31171.5, 31171.5, 31239.0, 31304.0, 31314.5, 31342.5, 31418.5, 31457.0, 31458.0, 31506.5, 31529.5, 31568.5, 31602.5, 31622.5, 31628.5, 31641.5, 31682.5, 31694.0, 31792.5, 31808.5, 31816.5, 31842.5, 31848.0, 31848.5, 31856.0, 31871.5, 31905.5, 31911.0, 31927.0, 31953.5, 32028.5, 32123.5, 32139.0, 32184.5, 32218.5, 32245.0, 32272.0, 32287.0, 32348.5, 32380.0, 32425.5, 32457.5, 32460.5, 32461.0, 32478.0, 32494.5, 32533.0, 32571.0, 32571.0, 32631.5, 32723.5, 32751.0, 32757.5, 32767.5, 32863.5, 32930.5, 32994.0, 33026.0, 33041.5, 33044.5, 33096.5, 33118.5, 33130.0, 33143.0, 33149.0, 33165.0, 33209.5, 33262.0, 33314.0, 33356.0, 33356.5, 33371.5, 33402.5, 33429.0, 33463.0, 33521.0, 33561.5, 33615.0, 33640.5, 33649.5, 33651.5, 33663.5, 33708.0, 33718.0, 33738.5, 33792.0, 33853.0, 33882.0, 33900.0, 33912.0, 33984.5, 33987.0, 34037.0, 34046.5, 34065.0, 34070.5, 34077.0, 34106.5, 34112.5, 34142.0, 34176.5, 34196.5, 34205.0, 34236.5, 34264.5, 34266.0, 34285.5, 34309.0, 34326.5, 34345.0, 34364.0, 34372.5, 34380.5, 34408.0, 34414.5, 34467.5, 34476.5, 34551.5, 34564.0, 34657.0, 34662.0, 34670.0, 34711.0, 34790.0, 34862.5, 34978.0, 34993.5, 35021.5, 35035.5, 35115.5, 35213.5, 35312.5, 35339.0, 35339.5, 35388.5, 35401.5, 35433.5, 35469.5, 35471.5, 35501.5, 35501.5, 35512.5, 35543.0, 35551.5, 35595.0, 35649.5, 35658.0, 35685.5, 35733.0, 35754.0, 35765.5, 35765.5, 35778.0, 35835.0, 35842.5, 35892.5, 35897.0, 35898.5, 35900.0, 35975.5, 35980.0, 36084.5, 36124.5, 36140.0, 36234.5, 36243.5, 36264.0, 36334.0, 36344.5, 36353.5, 36384.5, 36444.0, 36480.5, 36488.0, 36493.0, 36500.0, 36514.0, 36531.5, 36557.5, 36582.5, 36701.5, 36738.5, 36752.5, 36809.0, 36836.0, 36890.5, 36922.0, 36924.0, 36937.5, 36946.5, 36966.0, 36989.5, 37174.5, 37194.0, 37291.5, 37329.5, 37406.5, 37454.5, 37524.5, 37607.5, 37657.5, 37678.0, 37752.0, 37752.0, 37794.5, 37801.5, 37806.0, 37831.5, 37838.5, 37845.0, 37858.5, 37883.5, 37900.5, 37940.0, 38007.5, 38055.5, 38097.0, 38136.0, 38162.0, 38167.5, 38185.5, 38191.5, 38264.0, 38273.5, 38348.0, 38376.0, 38469.5, 38494.0, 38527.0, 38574.0, 38614.5, 38622.0, 38648.0, 38731.0, 38760.5, 38768.5, 38769.0, 38811.5, 38822.0, 38826.5, 38846.5, 38869.0, 38887.5, 38949.0, 39017.0, 39017.5, 39032.5, 39043.5, 39134.5, 39209.0, 39256.0, 39282.0, 39290.0, 39338.5, 39351.0, 39369.0, 39424.0, 39443.5, 39472.5, 39515.5, 39530.0, 39545.0, 39550.0, 39554.0, 39594.5, 39604.5, 39637.5, 39649.0, 39677.0, 39717.0, 39732.5, 39742.0, 39770.5, 39773.0, 39783.0, 39799.5, 39849.0, 39881.0, 39885.0, 39893.0, 39951.5, 39974.0, 40014.0, 40131.5, 40146.0, 40160.0, 40223.0, 40248.5, 40314.0, 40330.5, 40539.5, 40542.5, 40569.5, 40570.0, 40587.0, 40601.5, 40724.0, 40805.0, 40830.0, 40912.0, 40975.0, 40994.0, 41002.5, 41025.5, 41042.0, 41149.5, 41188.0, 41219.0, 41231.5, 41274.5, 41276.5, 41384.0, 41389.5, 41415.0, 41444.5, 41505.5, 41548.5, 41572.0, 41686.5, 41793.5, 41810.5, 41859.5, 41872.0, 41898.0, 41951.5, 42009.0, 42023.5, 42025.5, 42112.5, 42133.5, 42149.0, 42191.0, 42195.5, 42202.5, 42264.5, 42266.0, 42270.0, 42295.0, 42350.5, 42373.0, 42440.5, 42445.5, 42468.0, 42773.5, 42806.5, 42848.5, 42891.0, 42926.5, 43041.0, 43091.5, 43098.0, 43107.0, 43116.5, 43130.0, 43133.5, 43152.0, 43178.0, 43192.0, 43266.5, 43291.5, 43312.5, 43437.0, 43458.0, 43514.5, 43551.5, 43569.5, 43570.5, 43601.0, 43611.5, 43614.5, 43637.0, 43711.0, 43741.0, 43798.5, 43809.5, 43827.5, 43889.5, 43902.0, 43915.0, 43935.5, 43942.5, 44025.5, 44045.0, 44065.5, 44100.0, 44126.5, 44200.0, 44249.5, 44254.5, 44295.5, 44321.0, 44353.0, 44371.0, 44381.5, 44455.0, 44483.0, 44590.5, 44645.5, 44689.5, 44692.5, 44693.5, 44754.0, 44784.0, 44787.0, 44870.5, 44885.0, 44948.5, 44975.0, 45025.0, 45066.0, 45100.5, 45155.0, 45198.0, 45200.5, 45227.0, 45268.5, 45277.5, 45277.5, 45282.5, 45290.5, 45348.0, 45372.5, 45377.5, 45384.0, 45418.5, 45446.0, 45625.5, 45668.5, 45794.5, 45794.5, 45808.0, 45867.0, 45899.0, 45947.0, 46069.5, 46204.0, 46353.5, 46381.0, 46382.5, 46401.0, 46422.0, 46509.0, 46533.0, 46561.5, 46591.0, 46644.5, 46689.0, 46702.5, 46854.5, 46972.5, 46983.5, 47029.5, 47065.5, 47087.0, 47331.0, 47346.0, 47352.5, 47389.0, 47491.0, 47574.0, 47657.5, 47660.5, 47750.0, 47779.0, 47840.0, 47850.5, 47904.0, 47930.0, 47955.0, 47999.0, 48069.0, 48110.5, 48117.0, 48170.0, 48174.0, 48179.0, 48202.5, 48263.0, 48300.5, 48319.5, 48371.0, 48372.5, 48543.5, 48558.5, 48695.5, 48800.5, 48885.5, 48951.5, 49014.5, 49058.5, 49080.0, 49091.0, 49117.0, 49276.0, 49299.5, 49337.5, 49453.0, 49478.0, 49559.5, 49577.0, 49603.5, 49624.5, 49725.5, 49747.5, 49785.0, 49793.0, 49878.5, 49880.0, 49896.5, 49961.0, 50180.0, 50180.5, 50202.5, 50226.0, 50266.5, 50307.0, 50388.0, 50395.5, 50475.0, 50503.0, 50548.5, 50571.5, 50674.5, 50727.0, 50756.5, 50790.5, 50796.0, 50940.0, 50957.5, 50984.5, 51033.5, 51110.5, 51155.0, 51158.0, 51192.5, 51235.0, 51271.5, 51276.0, 51337.0, 51365.0, 51379.0, 51589.0, 51592.5, 51699.0, 51737.5, 51745.5, 51753.0, 51772.0, 51783.5, 51848.0, 51928.5, 51976.0, 51985.5, 51997.5, 52092.5, 52112.5, 52357.5, 52413.0, 52515.0, 52598.5, 52607.5, 52642.0, 52675.5, 52719.0, 52731.5, 52774.0, 52810.5, 52862.0, 52894.5, 52923.0, 52942.0, 52970.0, 53068.5, 53125.0, 53128.0, 53283.0, 53341.0, 53500.5, 53500.5, 53779.0, 53933.5, 54007.5, 54090.0, 54091.0, 54093.0, 54115.5, 54120.5, 54128.5, 54163.0, 54180.5, 54282.5, 54328.0, 54442.5, 54468.0, 54540.0, 54688.0, 54721.5, 54750.5, 54798.0, 54875.0, 54903.5, 55088.5, 55162.5, 55309.5, 55439.5, 55492.5, 55585.0, 55837.0, 56061.5, 56146.0, 56153.0, 56210.0, 56252.5, 56324.0, 56375.0, 56529.5, 56567.5, 56632.0, 56673.5, 56708.0, 56721.0, 56725.5, 56760.5, 56923.0, 56965.0, 56966.0, 57004.0, 57022.0, 57026.0, 57035.0, 57194.5, 57221.5, 57231.5, 57241.0, 57245.0, 57321.0, 57358.0, 57360.5, 57368.5, 57388.5, 57436.0, 57447.5, 57509.5, 57510.5, 57538.5, 57613.5, 57666.0, 57765.0, 57798.0, 57829.0, 57931.0, 57933.5, 57996.5, 58284.5, 58368.5, 58445.0, 58448.5, 58631.5, 58770.5, 58811.0, 58899.0, 58922.5, 59038.5, 59084.0, 59294.5, 59310.0, 59312.5, 59326.5, 59354.0, 59449.5, 59530.5, 59556.0, 59739.5, 59845.0, 59886.5, 59907.0, 59921.5, 60079.0, 60120.5, 60124.0, 60187.5, 60234.0, 60315.5, 60380.5, 60524.5, 60525.0, 60657.5, 60681.5, 60683.0, 60718.0, 60824.5, 60848.5, 60860.5, 60924.5, 60927.5, 60974.0, 61028.5, 61229.0, 61240.5, 61324.0, 61365.5, 61381.0, 61455.5, 61488.5, 61554.5, 61568.5, 61575.5, 61605.5, 61612.0, 61725.0, 61843.5, 61945.0, 61994.5, 62187.5, 62243.0, 62324.0, 62350.0, 62350.5, 62351.5, 62354.5, 62521.5, 62598.5, 62656.0, 62845.0, 62849.0, 63038.0, 63076.0, 63116.0, 63209.5, 63259.5, 63295.5, 63348.0, 63418.0, 63473.0, 63490.5, 63496.5, 63551.0, 63592.0, 63619.5, 63699.0, 63731.0, 63756.0, 63774.0, 63839.5, 63895.0, 63911.0, 64038.5, 64119.5, 64154.0, 64169.0, 64187.0, 64261.0, 64277.0, 64280.5, 64444.0, 64490.5, 64511.5, 64560.5, 64662.0, 64665.0, 64846.5, 64854.0, 65001.5, 65037.0, 65096.5, 65137.0, 65219.5, 65291.0, 65316.5, 65422.5, 65561.0, 65696.5, 65703.5, 65757.5, 65877.0, 65893.5, 65957.5, 66019.0, 66191.5, 66202.0, 66296.0, 66511.5, 66573.0, 66654.5, 66697.5, 66778.5, 66888.0, 66897.0, 66937.5, 66948.5, 67260.5, 67314.5, 67325.0, 67377.0, 67383.0, 67418.5, 67480.0, 67760.5, 67788.5, 67936.5, 68109.5, 68184.0, 68233.0, 68372.5, 68470.5, 68663.0, 68674.5, 68692.5, 68862.0, 68866.5, 68928.5, 69053.0, 69175.0, 69263.5, 69355.5, 69402.5, 69445.0, 69662.5, 69782.5, 69970.0, 69980.0, 70028.5, 70070.0, 70185.5, 70201.5, 70203.5, 70258.0, 70302.5, 70411.5, 70549.0, 70582.0, 70706.5, 70797.0, 70859.0, 70875.5, 70976.0, 70995.5, 71012.5, 71070.0, 71076.5, 71329.5, 71410.5, 71496.0, 71610.5, 71818.0, 71849.5, 71922.5, 71941.5, 71959.0, 72004.5, 72045.5, 72128.0, 72130.0, 72204.0, 72210.0, 72302.0, 72432.0, 72466.0, 72487.5, 72498.5, 72641.5, 72767.5, 72779.0, 72803.5, 72980.0, 73055.0, 73138.5, 73219.0, 73254.0, 73294.5, 73353.0, 73419.5, 73423.0, 73481.0, 73540.5, 73592.5, 73679.5, 73856.0, 73892.0, 73916.0, 73956.5, 74054.0, 74128.0, 74308.0, 74359.5, 74463.0, 74769.5, 74826.5, 75060.5, 75085.5, 75102.0, 75133.0, 75370.5, 75428.0, 75449.0, 75536.0, 75594.0, 75623.5, 75744.0, 75791.0, 75792.0, 75828.5, 75884.5, 75994.5, 76004.5, 76030.5, 76109.5, 76159.5, 76192.0, 76200.0, 76227.0, 76274.5, 76323.5, 76408.5, 76411.0, 76490.0, 76596.5, 76656.0, 76684.0, 76699.5, 76911.0, 76946.0, 77039.0, 77061.0, 77090.5, 77563.5, 77613.0, 77654.5, 78057.5, 78145.5, 78282.5, 78547.5, 78712.0, 78947.0, 79380.0, 79383.5, 79440.5, 79692.0, 79727.5, 79887.0, 79926.5, 79948.0, 79980.5, 80013.5, 80020.0, 80207.5, 80250.5, 80531.0, 80741.0, 80743.5, 80789.0, 80824.5, 81055.0, 81077.0, 81108.0, 81134.0, 81231.5, 81294.5, 81324.5, 81393.5, 81418.0, 81586.0, 81712.5, 81869.0, 81923.0, 81942.5, 81952.0, 82020.5, 82024.0, 82121.5, 82131.0, 82277.0, 82329.5, 82489.5, 82493.5, 82558.5, 82614.5, 82643.0, 82670.5, 82848.0, 82969.5, 82987.5, 82992.5, 83066.0, 83106.5, 83159.5, 83210.5, 83234.0, 83346.5, 83409.0, 83494.0, 83579.0, 83602.5, 83682.0, 84146.0, 84196.5, 84235.0, 84266.0, 84393.0, 84523.5, 84540.5, 84557.0, 84614.0, 84870.0, 84954.5, 85037.5, 85060.5, 85274.5, 85296.0, 85311.0, 85392.0, 85463.0, 85489.5, 85547.5, 85692.0, 85820.5, 85830.0, 85846.5, 85921.0, 85991.5, 86068.0, 86095.0, 86118.0, 86161.0, 86293.0, 86521.5, 86739.0, 86822.5, 86823.5, 86920.0, 86997.0, 87210.5, 87242.5, 87292.5, 87300.5, 87325.5, 87645.0, 87680.0, 87834.0, 87948.5, 88064.0, 88084.0, 88242.5, 88242.5, 88261.0, 88317.5, 88493.0, 88538.5, 88598.5, 88705.5, 88712.5, 88730.5, 89300.0, 89467.0, 89530.5, 89574.0, 89634.5, 89715.5, 89800.5, 89944.5, 90041.0, 90063.0, 90161.0, 90179.0, 90377.0, 90456.0, 90505.0, 91244.0, 91398.0, 91601.0, 91775.0, 91935.5, 91975.5, 92041.5, 92056.0, 92208.0, 92225.5, 92230.0, 92682.5, 92911.0, 92968.0, 93112.0, 93118.0, 93231.5, 93389.0, 93392.0, 93509.0, 93512.5, 93631.0, 93651.5, 94088.0, 94102.0, 94151.5, 94196.0, 94447.0, 94571.5, 94806.5, 94928.0, 94979.0, 95036.5, 95156.0, 95256.0, 95290.5, 95319.0, 95360.0, 95405.0, 95586.5, 95607.0, 95645.5, 95745.5, 95860.5, 96034.5, 96211.0, 96289.0, 96350.5, 96373.0, 96386.5, 96498.5, 96592.5, 96628.5, 96638.5, 96653.0, 96696.5, 96697.5, 96762.0, 96958.0, 97318.0, 97382.5, 97579.0, 97789.5, 97834.0, 97902.0, 97908.0, 98145.5, 98385.5, 98580.5, 98941.0, 99018.0, 99117.5, 99220.0, 99364.5, 100081.0, 100084.5, 100149.5, 100153.5, 100225.0, 100468.0, 100707.5, 100802.0, 101057.0, 101067.0, 101139.5, 101157.0, 101186.5, 101268.5, 101411.0, 101677.5, 101692.5, 102262.0, 102295.5, 102319.0, 102390.0, 102772.0, 102794.0, 103159.0, 103268.5, 103299.5, 103460.0, 103557.0, 103560.5, 103696.5, 103912.0, 104513.0, 104576.5, 105160.0, 105617.0, 105692.0, 105780.0, 105788.5, 105935.0, 106013.5, 106091.5, 106448.0, 106519.0, 106804.5, 106884.0, 107155.0, 107431.5, 107495.0, 107737.0, 108022.0, 108207.0, 108228.5, 108618.5, 108823.0, 108914.0, 109213.0, 109460.0, 109649.0, 109725.5, 109844.5, 109874.5, 109930.5, 110073.0, 110290.0, 110733.0, 110738.0, 110814.0, 110889.0, 111250.0, 111407.0, 111660.0, 111766.5, 111806.5, 111808.0, 112118.5, 112184.5, 112431.5, 112620.0, 112830.0, 112834.0, 113034.5, 113040.0, 113191.0, 113205.5, 113464.5, 113506.0, 113947.0, 113971.0, 113982.5, 114548.0, 114563.0, 114584.5, 114595.5, 114675.5, 114737.0, 114946.0, 114965.0, 115089.0, 115102.0, 115196.0, 115217.0, 115457.5, 115568.5, 115684.5, 115914.0, 116003.5, 116515.5, 116524.5, 116600.5, 116886.0, 116977.0, 117292.0, 117338.0, 117762.0, 117805.0, 117931.5, 118052.5, 118356.0, 118598.0, 118762.0, 119012.5, 119259.5, 119996.5, 120336.5, 120757.0, 120814.5, 120976.0, 120983.0, 121185.5, 121194.0, 121458.0, 121560.0, 121583.5, 121736.0, 122036.0, 122240.5, 122408.5, 122485.0, 122885.5, 123198.0, 123211.0, 123347.5, 123848.5, 124211.0, 124638.5, 125085.5, 125202.0, 125549.0, 125604.0, 125629.0, 125799.5, 125951.0, 125952.5, 126309.5, 126335.0, 126408.5, 126622.0, 126804.0, 126963.0, 126988.5, 127268.0, 127385.0, 127425.0, 127499.0, 127521.5, 128061.0, 128261.5, 128479.0, 128723.0, 129193.5, 129290.5, 129355.5, 129908.0, 130210.0, 130265.5, 130420.0, 130484.5, 130793.5, 130908.0, 130982.5, 131352.0, 131375.5, 131437.5, 131508.5, 131549.5, 131636.5, 132171.0, 132178.5, 132754.5, 132965.0, 133985.5, 134233.5, 134646.5, 135291.0, 135357.0, 135912.0, 135928.5, 136499.5, 136543.5, 136815.5, 136844.5, 136861.5, 137446.0, 137653.0, 137968.0, 138062.0, 138120.0, 138128.0, 138142.5, 138236.5, 138271.5, 138313.5, 138336.0, 138429.0, 138558.0, 138824.5, 138886.5, 140079.0, 140140.5, 140276.5, 140298.0, 140427.0, 140602.5, 141914.0, 142038.5, 142116.5, 142815.0, 142952.0, 143011.0, 143017.5, 143115.0, 143392.0, 143485.0, 144097.0, 144119.0, 144127.0, 144583.5, 144673.0, 144818.5, 145303.5, 145777.5, 146004.5, 146269.5, 146557.0, 146600.0, 146811.5, 147051.5, 147748.5, 147855.0, 148389.5, 148510.5, 148588.5, 149456.0, 150066.0, 150324.0, 150431.5, 150554.0, 151215.5, 151500.5, 151580.0, 151742.5, 151908.5, 152026.5, 152304.5, 152310.0, 152888.5, 152975.5, 153357.0, 154165.5, 154210.5, 154573.0, 154890.0, 154927.0, 155051.5, 155130.0, 155520.0, 155629.0, 155781.0, 155910.5, 156072.0, 156360.0, 156489.0, 156509.5, 156744.5, 156773.5, 156811.0, 157274.0, 157596.0, 157983.5, 158473.5, 158659.0, 158782.0, 159032.5, 159727.0, 159995.0, 160086.5, 160278.5, 160406.0, 160638.0, 161185.0, 161668.5, 161948.5, 161953.5, 162684.5, 162694.5, 163034.0, 163121.0, 164662.5, 164725.5, 165328.5, 165408.0, 165499.5, 166433.5, 167029.5, 167109.0, 167296.5, 167901.5, 167992.0, 168542.5, 168649.5, 168700.0, 169112.0, 169203.0, 169593.5, 169689.0, 169830.0, 170345.5, 170908.0, 171901.0, 172141.0, 172572.0, 172977.0, 173222.5, 173313.5, 173675.5, 173682.5, 173923.5, 174031.5, 174118.0, 174338.5, 174813.0, 174988.0, 175103.5, 175452.5, 176947.0, 177165.5, 177776.5, 177819.5, 178012.5, 179152.0, 180016.5, 181055.0, 181222.5, 181505.5, 182181.5, 182345.5, 182527.5, 183099.0, 183980.0, 184034.0, 184907.0, 185288.5, 185602.5, 185762.5, 185782.5, 185860.0, 186112.5, 188219.0, 188436.5, 188673.0, 189166.0, 189696.0, 189905.0, 190403.0, 190403.0, 190706.5, 191400.5, 192392.0, 192426.0, 193566.0, 194054.5, 194517.5, 194529.0, 194707.0, 194776.5, 195396.5, 195533.5, 196071.5, 196362.5, 196867.5, 196889.0, 197319.5, 197823.5, 197974.5, 198343.5, 199213.0, 199638.0, 199811.5, 200132.5, 200142.0, 200311.0, 201291.0, 201451.0, 201649.5, 202086.5, 202817.5, 203922.5, 204009.0, 204024.0, 204400.0, 204537.5, 204639.0, 205129.0, 205480.0, 205575.5, 205931.0, 205959.0, 206282.5, 207690.5, 208536.5, 209183.0, 209678.0, 209987.5, 210473.5, 210612.0, 210855.5, 210996.0, 211045.5, 211232.5, 212768.5, 213373.0, 213657.5, 213678.0, 213850.0, 214100.0, 214482.0, 214763.0, 215507.5, 215790.0, 215823.0, 216197.0, 216369.5, 216891.5, 217102.0, 217956.5, 218044.5, 218540.5, 218618.5, 219324.0, 219823.5, 220797.0, 221283.0, 222142.0, 224746.0, 224857.0, 225738.0, 225898.0, 227024.5, 227250.0, 227484.0, 227509.5, 227785.0, 228194.5, 228262.5, 231395.5, 231788.5, 232615.5, 232691.5, 232882.5, 234506.0, 235062.5, 235083.5, 235651.5, 235964.0, 236590.5, 236713.0, 237761.5, 238007.0, 238063.0, 238485.5, 238800.0, 238867.0, 239380.0, 240102.0, 240887.0, 241366.0, 241855.5, 241927.5, 242120.5, 243795.5, 243930.0, 244429.5, 244742.5, 246679.5, 252211.0, 253248.0, 254128.5, 255115.0, 255154.5, 256884.5, 257446.5, 258049.0, 259297.5, 260413.0, 261734.0, 262364.0, 262586.0, 264111.5, 264833.0, 265213.5, 265306.0, 265315.5, 265586.0, 266216.5, 266565.5, 269666.5, 269848.5, 271002.0, 271524.5, 271975.5, 272367.5, 273081.0, 273114.0, 273371.0, 273694.5, 274477.0, 274687.5, 275259.5, 275450.5, 275805.0, 276472.5, 277085.5, 277203.5, 278264.0, 279299.5, 282358.5, 282715.0, 285946.0, 286664.5, 287324.0, 287460.0, 287507.0, 288081.0, 288407.0, 291015.0, 291338.0, 292154.0, 292644.5, 295113.0, 295411.0, 296209.5, 297081.5, 297384.5, 297706.5, 298518.0, 298635.5, 298836.0, 299845.0, 302485.5, 302975.0, 303137.5, 306182.0, 306739.5, 307292.0, 310066.0, 312021.0, 312470.0, 313348.0, 315064.5, 315453.5, 318163.5, 319584.0, 320155.5, 320647.5, 320863.5, 321353.5, 322216.5, 322798.5, 323096.5, 325197.0, 326633.5, 327897.0, 328805.5, 329165.0, 329894.5, 330234.0, 330666.5, 331396.0, 331443.5, 331781.5, 332071.5, 334023.0, 334791.5, 335443.0, 335822.0, 335980.5, 336321.5, 338180.5, 338850.0, 339124.5, 341451.0, 342457.0, 345553.5, 352019.0, 355382.5, 355567.0, 356419.0, 356608.5, 357634.0, 357980.5, 359045.0, 359494.0, 360255.0, 361588.0, 363717.0, 364200.5, 365117.0, 368412.0, 368593.0, 369478.5, 370172.0, 370303.0, 372674.5, 373428.0, 377082.0, 377796.5, 379264.0, 379361.5, 381615.5, 382188.5, 383697.0, 387789.5, 388732.0, 391038.0, 391219.0, 391734.0, 392149.0, 395894.5, 397605.5, 402769.0, 404682.5, 405897.5, 406835.5, 407653.5, 408173.5, 412533.5, 414750.0, 415945.5, 416192.0, 419601.5, 429515.0, 432218.5, 433599.0, 434050.0, 434690.5, 439973.5, 440892.5, 449592.5, 450007.5, 451024.0, 455340.0, 455649.5, 455731.5, 458367.5, 463245.0, 464700.5, 466569.5, 471290.5, 471903.5, 475332.0, 482390.5, 483389.5, 486309.5, 489399.5, 489661.0, 491551.0, 492648.0, 493716.0, 494139.5, 494661.0, 496041.5, 496053.0, 506986.5, 509464.0, 509685.0, 511291.0, 516047.0, 516896.5, 519268.5, 520458.0, 523040.5, 523799.5, 524449.5, 526831.5, 529653.0, 532686.5, 536959.5, 538905.5, 547029.0, 547283.5, 547663.0, 548484.5, 553199.5, 561284.5, 561795.0, 563872.0, 566661.0, 572744.0, 574110.5, 575038.5, 580411.0, 583282.0, 583549.5, 587944.5, 589699.5, 596123.0, 600769.5, 603609.0, 606725.0, 611865.0, 616511.5, 619920.5, 620226.5, 621086.5, 624050.5, 632111.5, 635957.0, 637258.0, 637533.5, 638621.5, 644182.5, 648209.5, 660511.0, 662589.0, 668351.5, 671075.5, 687235.0, 691622.0, 693248.5, 697132.5, 699442.0, 709646.5, 710319.0, 730158.0, 731247.5, 736126.0, 738736.5, 742569.0, 746129.5, 754323.5, 754886.0, 761062.5, 763822.0, 769661.0, 771391.0, 777628.0, 782960.5, 783817.5, 786206.5, 791353.0, 796658.5, 803439.5, 804494.5, 816277.5, 819141.5, 822248.0, 828930.5, 829901.0, 830321.5, 846649.5, 850666.0, 860304.5, 866784.5, 877342.5, 896740.0, 897953.0, 898073.0, 904580.0, 905033.0, 905465.5, 913927.0, 923289.0, 949088.5, 960919.5, 967893.0, 970311.0, 971369.0, 973803.0, 990416.0, 1011849.0, 1017333.5, 1023812.0, 1024256.0, 1025093.5, 1028073.5, 1036202.0, 1048524.0, 1084504.0, 1088207.5, 1091301.0, 1092363.5, 1096649.5, 1122398.5, 1124051.0, 1126563.5, 1130932.0, 1141381.0, 1143823.5, 1147248.0, 1159200.5, 1164822.5, 1179012.0, 1185355.5, 1186137.0, 1199373.0, 1213850.0, 1226447.0, 1227795.0, 1241730.5, 1250344.5, 1271099.5, 1308830.5, 1335205.5, 1354857.0, 1367452.0, 1384484.0, 1393260.5, 1407600.5, 1444082.0, 1456089.5, 1463575.0, 1466029.0, 1487542.5, 1496226.0, 1509531.0, 1550382.5, 1557642.0, 1620713.5, 1621954.5, 1647187.5, 1651719.5, 1656036.0, 1671846.0, 1679297.0, 1743127.0, 1771679.0, 1777221.0, 1796861.0, 1818715.5, 1834863.5, 1840023.5, 1868414.0, 2030366.0, 2034570.0, 2062262.0, 2165584.5, 2177527.0, 2282103.5, 2296950.0, 2303886.5, 2372053.5, 2391551.5, 2456863.0, 2608833.0, 2860748.0, 2904168.0, 2910685.5, 2938108.0, 2943333.0, 3154520.5, 3201162.0, 3220943.0, 3272794.0, 3325314.5, 3372201.5, 4128670.5, 4676037.0, 4858142.0, 5239794.0, 8991108.5, 15937146.5]
count    323.000000
mean       0.000098
std        0.000029
min        0.000034
25%        0.000081
50%        0.000095
75%        0.000115
max        0.000227
Name: naiveproportion, dtype: float64

Now we can explain why our initial flagged set of high cancer rate counties is dominated by counties with low populations. Consider a county with a small population of 1000. In the given ten-year period, suppose this county has 1 kidney cancer death. The naive proportion would then be 1/1000 = 0.001 which is already 10 times the typical rate of 0.0001. So this county would immediately be flagged as a high cancer rate county just because of the single death. Go back to our flagged list of high population counties and check the population sizes.

In [11]:
d[d['naiveproportion'] >= threshold]['popm'].describe()
#The county with the largest population in this list is 60000 but there are counties with population as small as 1448.
Out[11]:
count      100.000000
mean     16940.950000
std      11922.792393
min       1448.000000
25%       8442.625000
50%      14204.250000
75%      21550.375000
max      60380.500000
Name: popm, dtype: float64

Bayesian estimates of the kidney cancer death rates¶

For Bayesian estimates, we need an appropriate prior. One way of obtaining such a prior is to consider the naive proportions of the large population counties (say, with population at least 300000) and to fit a Beta density to the naive proportions. This can be done via mean and variance as follows.

In [12]:
proportions_largecounties = d['naiveproportion'][d['popm'] >= 300000] #filter out the high population counties
mean_largecounties = np.mean(proportions_largecounties)
var_largecounties = np.var(proportions_largecounties)
display(mean_largecounties, var_largecounties)
9.813238615946435e-05
8.172661645662484e-10

For what values of $a$ and $b$ would the $\text{Beta}(a, b)$ density have mean $m$ and variance $V$. To answer this, we need to basically solve \begin{align*} \frac{a}{a+b} = m ~~ \text{ and } ~~ \frac{a}{a+b} \frac{b}{a+b} \frac{1}{a+b+1} = V. \end{align*} Plugging $m$ for $\frac{a}{a+b}$ and $1-m$ for $\frac{b}{a+b}$ in the second equation, we obtain \begin{align*} \frac{m(1-m)}{a+b+1} = V \implies a+b = \frac{m(1-m)}{V} - 1. \end{align*} Combining this with $\frac{a}{a+b} = m$, we obtain \begin{align*} a = m\left(\frac{m(1-m)}{V} - 1 \right) ~~~ \text{ and } ~~~ b = (1-m)\left(\frac{m(1-m)}{V} - 1 \right) \end{align*} Of course, $m$ and $V$ should be such that $a$ and $b$ as computed above should be strictly bigger than zero. If not, this method of figuring out $a$ and $b$ using mean and variance does not work.

In [13]:
#Using the above formula, we obtain
m = mean_largecounties
V = var_largecounties
a = ((m*m*(1-m))/V) - m
b = (((1-m)*(1-m)*m)/V) - (1-m)
display(a, b)
11.781889938777496
120049.39668603124

Note that here $a$ is much smaller than $b$ which means that the Beta density will be concentrated on very small values (this makes sense because this is intended to be a prior for the kidney cancer rate which we believe to be small with typical values around 0.0001).

In [14]:
#Plot of this Beta density
x = np.arange(0, 8e-4, 1e-6)
y = beta.pdf(x, a, b)
plt.plot(x, y, label = f"Beta({a}, {b})")
plt.title(f'Beta Density for a = {a} and b = {b}')
plt.xlabel("x")
plt.ylabel("Density")
plt.legend()
plt.show()
No description has been provided for this image

We now compute the Bayesian estimates of kidney cancer death rates. For a county with population $POP$ and death count $DC$, our posterior density of the rate would be \begin{align*} \text{Beta}(a + DC, b + POP - DC) \end{align*} Thus the posterior mean (which will be our Bayesian estimate of the kidney cancer death rate) is \begin{align*} \frac{a + DC}{a + b + POP}. \end{align*}

In [15]:
d['bayesestimate'] = (a + d['dct']) / (a + b + d['popm'])
#We can now flag the top 100 counties using the Bayesian estimate:
threshold = d['bayesestimate'].nlargest(100).iloc[-1]
d['bayeshigh'] = d['bayesestimate'] >= threshold
#Let us look at these counties to see if they are different from the previous flagged list:
with pd.option_context('display.max_rows', None):
    display(d[d['bayeshigh'] >= True])
state Location fips dc pop dc.2 pop.2 dct popm naiveproportion cancerhigh modfips bayesestimate bayeshigh
61 ALABAMA Tallapoosa County, Alabama 1123 8 67871 7 68002 15 67936.5 0.000221 False 01123 0.000142 True
71 ARIZONA Graham County, Arizona 4009 7 51342 6 53873 13 52607.5 0.000247 False 04009 0.000144 True
74 ARIZONA Mohave County, Arizona 4015 8 149437 22 189941 30 169689.0 0.000177 False 04015 0.000144 True
79 ARIZONA Yavapai County, Arizona 4025 14 176766 21 223856 35 200311.0 0.000175 False 04025 0.000146 True
106 ARKANSAS Garland County, Arkansas 5051 10 154925 18 158622 28 156773.5 0.000179 False 05051 0.000144 True
148 ARKANSAS Sharp County, Arkansas 5135 4 34922 5 33806 9 34364.0 0.000262 True 05135 0.000135 True
172 CALIFORNIA Lake County, California 6033 12 93785 11 109600 23 101692.5 0.000226 False 06033 0.000157 True
183 CALIFORNIA Napa County, California 6055 13 236568 29 247143 42 241855.5 0.000174 False 06055 0.000149 True
280 CONNECTICUT Middlesex County, Connecticut 9007 20 304656 35 320284 55 312470.0 0.000176 False 09007 0.000154 True
294 FLORIDA Broward County, Florida 12011 197 2212734 207 2381166 404 2296950.0 0.000176 False 12011 0.000172 True
296 FLORIDA Charlotte County, Florida 12015 16 154371 27 209992 43 182181.5 0.000236 False 12015 0.000181 True
297 FLORIDA Citrus County, Florida 12017 14 146051 23 191349 37 168700.0 0.000219 False 12017 0.000169 True
299 FLORIDA Collier County, Florida 12021 22 229457 25 301715 47 265586.0 0.000177 False 12021 0.000152 True
314 FLORIDA Hernando County, Florida 12053 8 125721 21 191597 29 158659.0 0.000183 False 12053 0.000146 True
323 FLORIDA Lee County, Florida 12071 36 509853 62 657246 98 583549.5 0.000168 False 12071 0.000156 True
328 FLORIDA Manatee County, Florida 12081 31 345451 27 413077 58 379264.0 0.000153 False 12081 0.000140 True
329 FLORIDA Marion County, Florida 12083 16 280325 33 361402 49 320863.5 0.000153 False 12083 0.000138 True
330 FLORIDA Martin County, Florida 12085 18 161947 15 202744 33 182345.5 0.000181 False 12085 0.000148 True
338 FLORIDA Palm Beach County, Florida 12099 111 1314930 145 1617128 256 1466029.0 0.000175 False 12099 0.000169 True
339 FLORIDA Pasco County, Florida 12101 58 498118 54 598851 112 548484.5 0.000204 False 12101 0.000185 True
340 FLORIDA Pinellas County, Florida 12103 146 1612081 153 1746513 299 1679297.0 0.000178 False 12103 0.000173 True
346 FLORIDA Sarasota County, Florida 12115 55 485595 65 573711 120 529653.0 0.000227 False 12115 0.000203 True
352 FLORIDA Volusia County, Florida 12127 43 596543 50 724479 93 660511.0 0.000141 False 12127 0.000134 True
568 ILLINOIS Bureau County, Illinois 17011 7 92928 12 87198 19 90063.0 0.000211 False 17011 0.000146 True
570 ILLINOIS Carroll County, Illinois 17015 5 44858 7 41446 12 43152.0 0.000278 True 17015 0.000146 True
575 ILLINOIS Clay County, Illinois 17025 8 37117 5 35052 13 36084.5 0.000360 True 17025 0.000159 True
612 ILLINOIS LaSalle County, Illinois 17099 22 265008 25 255818 47 260413.0 0.000180 False 17099 0.000154 True
618 ILLINOIS Macoupin County, Illinois 17117 9 116685 11 114684 20 115684.5 0.000173 False 17117 0.000135 True
630 ILLINOIS Montgomery County, Illinois 17135 10 77006 7 75055 17 76030.5 0.000224 False 17135 0.000147 True
683 INDIANA Dubois County, Indiana 18037 9 85189 7 88456 16 86822.5 0.000184 False 18037 0.000134 True
732 INDIANA Randolph County, Indiana 18135 11 70673 5 67051 16 68862.0 0.000232 False 18135 0.000147 True
789 IOWA Fayette County, Iowa 19065 6 60513 7 53969 13 57241.0 0.000227 False 19065 0.000140 True
793 IOWA Greene County, Iowa 19073 4 28283 4 25160 8 26721.5 0.000299 True 19073 0.000135 True
819 IOWA Marion County, Iowa 19125 8 72517 6 72415 14 72466.0 0.000193 False 19125 0.000134 True
833 IOWA Polk County, Iowa 19153 45 683277 52 699967 97 691622.0 0.000140 False 19153 0.000134 True
839 IOWA Shelby County, Iowa 19165 4 36486 5 33239 9 34862.5 0.000258 True 19165 0.000134 True
844 IOWA Union County, Iowa 19175 7 32300 2 30043 9 31171.5 0.000289 True 19175 0.000137 True
933 KANSAS Reno County, Kansas 20155 15 152347 15 148516 30 150431.5 0.000199 False 20155 0.000154 True
935 KANSAS Rice County, Kansas 20159 5 27283 5 25668 10 26475.5 0.000378 True 20159 0.000149 True
951 KANSAS Sumner County, Kansas 20191 5 60141 12 60620 17 60380.5 0.000282 True 20191 0.000160 True
1039 KENTUCKY McCracken County, Kentucky 21145 18 130840 8 130747 26 130793.5 0.000199 False 21145 0.000151 True
1081 LOUISIANA Acadia Parish, Louisiana 22001 8 115970 12 114422 20 115196.0 0.000174 False 22001 0.000135 True
1115 LOUISIANA Natchitoches Parish, Louisiana 22069 7 60729 6 57924 13 59326.5 0.000219 False 22069 0.000138 True
1116 LOUISIANA Orleans Parish, Louisiana 22071 46 561106 41 484975 87 523040.5 0.000166 False 22071 0.000154 True
1172 MARYLAND Garrett County, Maryland 24023 5 65546 9 65847 14 65696.5 0.000213 False 24023 0.000139 True
1185 MASSACHUSETTS Barnstable County, Massachusetts 25001 23 354916 36 403807 59 379361.5 0.000156 False 25001 0.000142 True
1230 MICHIGAN Huron County, Michigan 26063 8 88924 9 86973 17 87948.5 0.000193 False 26063 0.000138 True
1249 MICHIGAN Manistee County, Michigan 26101 9 54118 3 51671 12 52894.5 0.000227 False 26101 0.000138 True
1263 MICHIGAN Ogemaw County, Michigan 26129 3 41030 7 43017 10 42023.5 0.000238 False 26129 0.000134 True
1270 MICHIGAN Roscommon County, Michigan 26143 5 41957 6 44959 11 43458.0 0.000253 True 26143 0.000139 True
1295 MINNESOTA Clay County, Minnesota 27027 12 117417 8 115614 20 116515.5 0.000172 False 27027 0.000134 True
1312 MINNESOTA Itasca County, Minnesota 27061 8 103740 13 98574 21 101157.0 0.000208 False 27061 0.000148 True
1330 MINNESOTA Morrison County, Minnesota 27097 11 72683 6 72875 17 72779.0 0.000234 False 27097 0.000149 True
1337 MINNESOTA Otter Tail County, Minnesota 27111 11 128042 14 125202 25 126622.0 0.000197 False 27111 0.000149 True
1350 MINNESOTA Saint Louis County, Minnesota 27137 36 517061 37 472261 73 494661.0 0.000148 False 27137 0.000138 True
1358 MINNESOTA Todd County, Minnesota 27153 5 62645 8 59836 13 61240.5 0.000212 False 27153 0.000137 True
1367 MINNESOTA Wright County, Minnesota 27171 12 149423 13 160837 25 155130.0 0.000161 False 27171 0.000134 True
1455 MISSOURI Barry County, Missouri 29009 5 59669 9 64018 14 61843.5 0.000226 False 29009 0.000142 True
1457 MISSOURI Bates County, Missouri 29013 9 37094 2 36524 11 36809.0 0.000299 True 29013 0.000145 True
1485 MISSOURI Dunklin County, Missouri 29069 7 78502 8 72396 15 75449.0 0.000199 False 29069 0.000137 True
1487 MISSOURI Gasconade County, Missouri 29073 6 32412 7 33035 13 32723.5 0.000397 True 29073 0.000162 True
1545 MISSOURI Saint Louis City, Missouri 29510 43 530715 31 503078 74 516896.5 0.000143 False 29510 0.000135 True
1557 MISSOURI Taney County, Missouri 29213 7 51901 6 58424 13 55162.5 0.000236 False 29213 0.000141 True
1635 NEBRASKA Cedar County, Nebraska 31027 4 27890 4 25872 8 26881.0 0.000298 True 31027 0.000135 True
1746 NEW.JERSEY Cape May County, New Jersey 34009 13 190513 18 206174 31 198343.5 0.000156 False 34009 0.000134 True
1748 NEW.JERSEY Essex County, New Jersey 34013 74 1186092 84 1108404 158 1147248.0 0.000138 False 34013 0.000134 True
1756 NEW.JERSEY Ocean County, New Jersey 34029 56 823481 82 931204 138 877342.5 0.000157 False 34029 0.000150 True
1810 NEW.YORK Essex County, New York 36031 12 88129 6 88356 18 88242.5 0.000204 False 36031 0.000143 True
1901 NORTH.CAROLINA Henderson County, North Carolina 37089 11 140912 15 152711 26 146811.5 0.000177 False 37089 0.000142 True
1909 NORTH.CAROLINA Lee County, North Carolina 37105 6 70067 10 74908 16 72487.5 0.000221 False 37105 0.000144 True
1937 NORTH.CAROLINA Rutherford County, North Carolina 37161 7 115562 13 119022 20 117292.0 0.000171 False 37161 0.000134 True
2123 OKLAHOMA Grady County, Oklahoma 40051 10 96371 7 95698 17 96034.5 0.000177 False 40051 0.000133 True
2135 OKLAHOMA Kiowa County, Oklahoma 40075 6 27698 3 24807 9 26252.5 0.000343 True 40075 0.000142 True
2144 OKLAHOMA McClain County, Oklahoma 40087 7 51642 6 53979 13 52810.5 0.000246 False 40087 0.000143 True
2160 OKLAHOMA Pottawatomie County, Oklahoma 40125 11 125946 10 125653 21 125799.5 0.000167 False 40125 0.000133 True
2165 OKLAHOMA Sequoyah County, Oklahoma 40135 6 62768 7 64416 13 63592.0 0.000204 False 40135 0.000135 True
2166 OKLAHOMA Stephens County, Oklahoma 40137 7 104268 13 97866 20 101067.0 0.000198 False 40137 0.000144 True
2178 OREGON Clatsop County, Oregon 41007 6 78001 9 78114 15 78057.5 0.000192 False 41007 0.000135 True
2195 OREGON Lincoln County, Oregon 41041 9 86243 8 86079 17 86161.0 0.000197 False 41041 0.000140 True
2221 PENNSYLVANIA Cambria County, Pennsylvania 42021 30 421206 32 392465 62 406835.5 0.000152 False 42021 0.000140 True
2226 PENNSYLVANIA Clarion County, Pennsylvania 42031 7 104518 11 101026 18 102772.0 0.000175 False 42031 0.000134 True
2238 PENNSYLVANIA Franklin County, Pennsylvania 42055 21 271832 21 278687 42 275259.5 0.000153 False 42055 0.000136 True
2245 PENNSYLVANIA Lackawanna County, Pennsylvania 42069 37 521701 50 510393 87 516047.0 0.000169 False 42069 0.000155 True
2254 PENNSYLVANIA Mifflin County, Pennsylvania 42087 19 110161 6 109290 25 109725.5 0.000228 False 42087 0.000160 True
2258 PENNSYLVANIA Northampton County, Pennsylvania 42095 37 536374 42 557684 79 547029.0 0.000144 False 42095 0.000136 True
2264 PENNSYLVANIA Schuylkill County, Pennsylvania 42107 30 374792 25 364165 55 369478.5 0.000149 False 42107 0.000136 True
2266 PENNSYLVANIA Somerset County, Pennsylvania 42111 15 196243 15 190889 30 193566.0 0.000155 False 42111 0.000133 True
2281 RHODE.ISLAND Providence County, Rhode Island 44007 83 1265359 91 1276840 174 1271099.5 0.000137 False 44007 0.000134 True
2283 SOUTH.CAROLINA Abbeville County, South Carolina 45001 5 37296 5 38019 10 37657.5 0.000266 True 45001 0.000138 True
2308 SOUTH.CAROLINA Horry County, South Carolina 45051 20 214099 18 267675 38 240887.0 0.000158 False 45051 0.000138 True
2434 TENNESSEE Henry County, Tennessee 47079 6 61818 9 60913 15 61365.5 0.000244 False 47079 0.000148 True
2447 TENNESSEE Loudon County, Tennessee 47105 6 70000 9 72153 15 71076.5 0.000211 False 47105 0.000140 True
2497 TEXAS Austin County, Texas 48015 8 40725 4 42896 12 41810.5 0.000287 True 48015 0.000147 True
2578 TEXAS Gonzales County, Texas 48177 5 39364 8 39825 13 39594.5 0.000328 True 48177 0.000155 True
2728 TEXAS Washington County, Texas 48477 5 45594 7 49721 12 47657.5 0.000252 True 48477 0.000142 True
2739 TEXAS Wood County, Texas 48499 10 56053 3 62572 13 59312.5 0.000219 False 48499 0.000138 True
3030 WISCONSIN Douglas County, Wisconsin 55031 12 105866 10 98772 22 102319.0 0.000215 False 55031 0.000152 True
3074 WISCONSIN Sheboygan County, Wisconsin 55117 16 244470 22 244389 38 244429.5 0.000155 False 55117 0.000137 True
3077 WISCONSIN Vernon County, Wisconsin 55123 7 64887 6 64001 13 64444.0 0.000202 False 55123 0.000134 True
3084 WISCONSIN Waushara County, Wisconsin 55137 5 45859 8 46906 13 46382.5 0.000280 True 55137 0.000149 True

Clearly now there are many counties with large populations in the list. We can plot these counties now on the US map as before.

In [16]:
#Plotting counties with high Bayesian estimates of kidney cancer rates on the US map
import plotly.express as px
#First convert fips to standard form:
d['modfips'] = d['fips'].apply(lambda x: str(x).zfill(5))
#Plotting the counties with high cancer proportion:
fig = px.choropleth(
    d, 
    geojson="https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json", # This is a dataset containing county geometries
    locations='modfips', # Use the 'fips' column for county identification
    color='bayeshigh',
    hover_name='Location',  # Display the 'Location' column on hover
    scope="usa",
    title="Top 100 kidney cancer death rate counties (in terms of Bayes estimates)",
    color_continuous_scale="YlOrRd"
)
fig.update_geos(fitbounds="locations")
fig.update_layout(margin={"r":0,"t":40,"l":0,"b":0})
fig.show()
In [17]:
#Look at the states for the high counties (under both the naive and Bayes estimates)
counts = d.groupby('state').agg({'cancerhigh': 'sum', 'bayeshigh': 'sum'})
print(counts)
                      cancerhigh  bayeshigh
state                                      
ALABAMA                        1          1
ARIZONA                        0          3
ARKANSAS                       2          2
CALIFORNIA                     0          2
COLORADO                       1          0
CONNECTICUT                    0          1
DELAWARE                       0          0
DISTRICT.OF.COLUMBIA           0          0
FLORIDA                        0         14
GEORGIA                        4          0
HAWAII                         0          0
IDAHO                          2          0
ILLINOIS                       3          6
INDIANA                        0          2
IOWA                           5          6
KANSAS                         6          3
KENTUCKY                       2          1
LOUISIANA                      0          3
MAINE                          0          0
MARYLAND                       0          1
MASSACHUSETTS                  0          1
MICHIGAN                       4          4
MINNESOTA                      1          7
MISSISSIPPI                    8          0
MISSOURI                       7          6
MONTANA                        6          0
NEBRASKA                       5          1
NEVADA                         1          0
NEW.HAMPSHIRE                  0          0
NEW.JERSEY                     0          3
NEW.MEXICO                     0          0
NEW.YORK                       0          1
NORTH.CAROLINA                 2          3
NORTH.DAKOTA                   4          0
OHIO                           0          0
OKLAHOMA                       5          6
OREGON                         2          2
PENNSYLVANIA                   0          8
RHODE.ISLAND                   0          1
SOUTH.CAROLINA                 1          2
SOUTH.DAKOTA                   7          0
TENNESSEE                      1          2
TEXAS                         13          4
UTAH                           0          0
VERMONT                        0          0
VIRGINIA                       4          0
WASHINGTON                     0          0
WEST.VIRGINIA                  1          0
WISCONSIN                      2          4
WYOMING                        0          0

Clearly, there are quite many differences between the two estimates.

Let us now see how the Bayes rule converts prior to posterior for some specific counties.

In [18]:
#Consider the county in row 1681 of the dataset:
print(d.iloc[1681])
state                                NEBRASKA
Location           McPherson County, Nebraska
fips                                    31117
dc                                          0
pop                                      1504
dc.2                                        1
pop.2                                    1392
dct                                         1
popm                                   1448.0
naiveproportion                      0.000691
cancerhigh                               True
modfips                                 31117
bayesestimate                        0.000105
bayeshigh                               False
Name: 1681, dtype: object
In [19]:
#This county has low population (1448) and the naiveproportion is 0.000691 (which is roughly seven times the typical value of 0.0001)
#So this has been flagged as a high risk county by the naive estimates
#Note though that this is not a high risk county according to the Bayesian estimate
x = np.arange(0, 8e-4, 1e-6)
Tot = round(d.loc[1681, 'popm'])
Pos = d.loc[1681, 'dct']
Neg = Tot - Pos
plt.plot(x, beta.pdf(x, a, b), color = 'red', label = 'Prior')
plt.plot(x, beta.pdf(x, a + Pos, b + Neg), label = 'Posterior')
plt.axvline(x = Pos/Tot, color = 'blue') #line at the naive proportion
plt.legend()
plt.show()
#Basically there is very little difference between the prior and the posterior
#In other words, the Bayesian posterior is largely ignoring the "review" data and only going by the prior
#This makes sense because there is not much information from the observed count for such a small county.
No description has been provided for this image
In [20]:
#Next let us look at the county in row 1624. This is Arthur County again Nebraska
#It has a small population of 1233 and death count of 0. 
print(d.iloc[1624])
x = np.arange(0, 8e-4, 1e-6)
Tot = round(d.loc[1624, 'popm'])
Pos = d.loc[1624, 'dct']
Neg = Tot - Pos
plt.plot(x, beta.pdf(x, a, b), color = 'red', label = 'Prior')
plt.plot(x, beta.pdf(x, a + Pos, b + Neg), label = 'Posterior')
plt.axvline(x = Pos/Tot, color = 'blue') #line at the naive proportion
plt.legend()
plt.show()
#Similar story here; not much difference between prior and posterior
state                             NEBRASKA
Location           Arthur County, Nebraska
fips                                 31005
dc                                       0
pop                                   1295
dc.2                                     0
pop.2                                 1172
dct                                      0
popm                                1233.5
naiveproportion                        0.0
cancerhigh                           False
modfips                              31005
bayesestimate                     0.000097
bayeshigh                            False
Name: 1624, dtype: object
No description has been provided for this image
In [21]:
print(d.iloc[346])
#This is a large county (population around half a million) and a large death count (120)
x = np.arange(0, 8e-4, 1e-6)
Tot = round(d.loc[346, 'popm'])
Pos = d.loc[346, 'dct']
Neg = Tot - Pos
plt.plot(x, beta.pdf(x, a, b), color = 'red', label = 'Prior')
plt.plot(x, beta.pdf(x, a + Pos, b + Neg), label = 'Posterior')
plt.axvline(x = Pos/Tot, color = 'blue') #line at the naive proportion
plt.legend()
plt.show()
#The Bayesian posterior moves quite a bit in the direction of the frequentist estimate
#The frequentist analysis does not flag this as a high-risk county because there were many low population counties
#which dominated its proportion. 
state                               FLORIDA
Location           Sarasota County, Florida
fips                                  12115
dc                                       55
pop                                  485595
dc.2                                     65
pop.2                                573711
dct                                     120
popm                               529653.0
naiveproportion                    0.000227
cancerhigh                            False
modfips                               12115
bayesestimate                      0.000203
bayeshigh                              True
Name: 346, dtype: object
No description has been provided for this image

Bayesian estimates are often said to perform "Shrinkage". They shrink the frequentist estimates towards the prior. Overall, the variability of Bayesian estimates will be quite a bit smaller compared to frequentist estimates.

In [22]:
h1, _, _ = plt.hist(d['naiveproportion'], bins=100, color='blue', label='Naive Estimate', alpha=0.5)
h2, _, _ = plt.hist(d['bayesestimate'], bins=100, color='red', label='Bayes Estimate', alpha=0.5)

plt.title("Histograms of the Naive and Bayesian estimates")
plt.xlabel("Estimates")
plt.legend() 
plt.show()
No description has been provided for this image

This shrinkage is one reason why the Bayes method was able to flag Sarasota County, Florida as a high risk county even though its estimate was only 0.000203; this was already extreme compared to the rest of the Bayesian estimates.

Recall that the original dataset had two time periods 80-84 and 85-89. We now repeat the analysis only for the first time period and attempt to predict the proportions for the second time period. We can compare the performances of the naive proportions and the Bayes estimates in this prediction problem.

In [23]:
d['naiveproportion1'] = d['dc'] / d['pop'] #these are the naive estimates

#For the Bayes estimates, we first need to estimate the prior:
proportions1_largecounties = d['naiveproportion1'][d['pop'] >= 300000] #filter out the high population counties
m1 = np.mean(proportions1_largecounties)
V1 = np.var(proportions1_largecounties)
a1 = ((m1*m1*(1-m1))/V1) - m1
b1 = (((1-m1)*(1-m1)*m1)/V1) - (1-m1)

d['bayesestimate1'] = (d['dc'] + a1)/(d['pop'] + a1 + b1)

#Proportions from the second half of the data:
d['proportion2'] = d['dc.2'] / d['pop.2']

#Prediction Accuracy:
sum_diff_naiveproportion1 = d['naiveproportion1'] - d['proportion2']
sum_abs_diff_naiveproportion1 = sum_diff_naiveproportion1.abs().sum()

sum_diff_bayesestimate1 = d['bayesestimate1'] - d['proportion2']
sum_abs_diff_bayesestimate1 = sum_diff_bayesestimate1.abs().sum()

bothresults = [sum_abs_diff_naiveproportion1, sum_abs_diff_bayesestimate1]
print(bothresults)
[0.12508482755988196, 0.10530340006718594]

There is thus an improvement (slight) in the prediction accuracy using Bayes estimates as opposed to the naive estimates, in terms of the sum of absolute deviations (note that other comparison metrics might show different results).