Lecture Seven: Beta-Binomial Inference¶
Beta Densities¶
Let us start today by noting some general properties of Beta densities. For two positive real numbers $a$ and $b$, the $\text{Beta}(a, b)$ density is given by: \begin{align*} f_{\text{Beta}(a, b)}(u) = \frac{u^{a-1} (1-u)^{b-1} I\{0 \leq u \leq 1\}}{\int_0^1 u^{a-1} (1-u)^{b-1} du} \end{align*} The numbers $a$ and $b$ control the shape of the Beta density. The following facts on the $\text{Beta}(a, b)$ distribution will be quite useful for rest of today's lecture:
- The simplest Beta density is the uniform density corresponding to $a = 1$ and $b = 1$ which is flat on $[0, 1]$.
- The mean of the Beta distribution is $\frac{a}{a+b}$. For example, when $a = 10$ and $b = 30$, the mean equals $0.25$. On the other hand, when $a = 30$ and $b = 10$, the mean equals $0.75$.
- The variance of the Beta distribution is $\frac{ab}{(a+b)^2(a+b+1)} = \frac{a}{a+b} \frac{b}{a+b} \frac{1}{a+b+1}$. Thus when $a+b$ is large, the variance will be small and the Beta density will look quite skinny around its mean.
You can learn a lot more about Beta Densities from its wikipedia page.
#Plotting Beta densities:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta
a = 10
b = 100000
#x = np.linspace(0, 1, 1000)
x = np.arange(0, 8e-4, 1e-6)
y = beta.pdf(x, a, b)
plt.plot(x, y, label = f"Beta({a}, {b})")
plt.title(f'Beta Density for a = {a} and b = {b}')
plt.xlabel("x")
plt.ylabel("Density")
plt.legend()
plt.show()
Recap of Last Lecture¶
In the last lecture, we focused on the following problem: estimate the quality of a product from reviews. We used $\theta$ to denote the quality of the product. The reviews are summarized by the number of positive reviews ($Pos$) and the number of negative reviews ($Neg$); the total number of reviews is $Tot = Pos + Neg$. The likelihood that we worked with was the Bernoulli likelihood given by: \begin{align*} \mathbb{P} \{\text{reviews} \mid \theta = u \} = u^{Pos} (1-u)^{Neg}. \end{align*} The naive estimate for $\theta$ is the proportion $\frac{Pos}{Tot} = \frac{Pos}{Pos + Neg}$. This may be reasonable when the number of reviews is large but quite unreasonable when the number of reviews is small. For example, when $Pos = 3$, $Neg = 0$ (so that $Tot = 3$), the proportion is 1 which is clearly not a good estimate of $\theta$ (we do not want to infer that the microwave is perfect based on only three reviews). In the sequel, the names "naive estimate", "proportion", "naive proportion", "MLE", "Frequentist Estimate" all refer to the same thing.
The Bayesian approach uses probability and supplements the likelihood with a prior on $\theta$. The prior reflects the uncertainty in $\theta$ before observing review data. In the last class, we worked with the uniform prior: $\theta \sim \text{Uniform}[0, 1]$. We saw that the posterior is given by $\text{Beta}(1 + Pos, 1 + Neg)$. In other words, \begin{align*} \text{uniform prior} ~~ and ~~ \text{Bernoulli likelihood} ~~ \implies ~~ \text{Beta}(1 + Pos, 1 + Neg) \text{ posterior} \end{align*}
More General Priors¶
Because the uniform prior is also $\text{Beta}(1, 1)$, we can rewrite the above as: \begin{align*} \text{Beta}(1, 1) \text{ prior} ~~ + ~~ \text{Bernoulli likelihood} ~~ \implies ~~ \text{Beta}(1 + Pos, 1 + Neg) \text{ posterior} \end{align*} The above fact actually works if we replace 1 and 1 by any positive $a$ and $b$. In other words, \begin{align*} \text{Beta}(a, b) \text{ prior} ~~ + ~~ \text{Bernoulli likelihood} ~~ \implies ~~ \text{Beta}(a + Pos, b + Neg) \text{ posterior} \end{align*} We can take advantage of this and work with non-uniform priors. For example, suppose we believe that products in this specific marketplace are never of quality below 0.8 and typically of quality 0.9, we can incorporate this as a $\text{Beta}(a, b)$ prior as follows. We set the mean $\frac{a}{a+b}$ to be equal to 0.9 and the standard deviation to be equal to $0.05$ (so that two standard deviations cover the gap between the typical value 0.9 and the realistic lowest value $0.8$): \begin{align*} \frac{a}{a+b} = 0.9 ~~ \text{ and } ~~ \frac{a}{a+b} \frac{b}{a+b} \frac{1}{a+b+1} = (0.05)^2 = 0.0025 \end{align*} The first equation above gives $a = 9 b$ and the second equation gives \begin{align*} \frac{0.9 \times 0.1}{a+b+1} = 0.0025 \implies a+b = 35. \end{align*} Combining with $a = 9b$, we deduce $a = 31.5$ and $b = 3.5$. This prior and the corresponding posterior for specific values of $Pos$ and $Neg$ can be plotted as follows.
# Define the range of values
x = np.linspace(0, 1, 1000)
a = 31.5
b = 3.5
pos = 3
neg = 0
pdf1 = beta.pdf(x, a, b) #prior density
pdf2 = beta.pdf(x, a + pos, b + neg) #posterior density
# Plot the densities
plt.figure(figsize=(10, 6))
plt.plot(x, pdf1, label='Prior', color='blue')
plt.plot(x, pdf2, label='Posterior', color='red')
plt.title('Prior and Posterior')
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()
#Repeat these plots for other values of pos and neg such as pos = 19, neg = 1 and also for pos = 1, neg = 19
Comparison between Naive Estimates (proportions) and Bayesian Estimates¶
Recall again that \begin{align*} \text{naive estimate} = \text{proportion} = \frac{Pos}{Tot} = \frac{Pos}{Pos+Neg} \end{align*} The Bayesian does not use the naive estimate but instead summarizes the posterior uncertainty in the form of the $\text{Beta}(a + Pos, b + Neg)$ density. One commonly used Bayesian Estimate is the posterior mean which is \begin{align*} \text{Bayesian estimate} = \frac{a + Pos}{a + b + Tot} = \frac{a + Pos}{a + b + Pos + Neg} \end{align*} These posterior means can be much better summaries for the quality of each microwave compared to the naive proportion. For example, when $Pos = 3$ and $Neg = 0$ the posterior mean becomes $\frac{a+3}{a+b+3}$ which, for the uniform prior ($a = b = 1$), becomes $\frac{4}{5} = 0.8$. $0.8$ is probably a more reasonable estimate for $\theta$ in this case compared to the the naive proportion which equals 1.
The following is a very insightful equation relating the naive estimate and the Bayesian estimate (this relation also involves the prior mean $\frac{a}{a+b}$): \begin{align*} \frac{a + Pos}{a + b + Tot} = \left(\frac{a+b}{a+b+Tot} \right) \frac{a}{a+b} + \left(\frac{Tot}{a+b+Tot} \right) \frac{Pos}{Tot} \end{align*} which is equivalent to \begin{align*} \text{Bayesian Estimate} = \left(\frac{a+b}{a+b+Tot} \right) \text{Prior Mean} + \left(\frac{Tot}{a+b+Tot} \right) \text{Naive Estimate} \end{align*} Letting \begin{align*} w = \frac{a+b}{a+b+Tot}, \end{align*} we can rewrite the equation as \begin{align*} \text{Bayesian Estimate} = w \times \text{Prior Mean} + \left(1- w\right) \times \text{Naive Estimate} \end{align*} This leads to following observations:
- The Bayesian estimate is a weighted average of the prior mean and the naive estimate
- If the prior mean and the naive estimate are close to each other, then the Bayesian estimate will also be close to the naive estimate (as well as the prior mean)
- If the prior mean and the naive estimate are different, then the Bayesian estimate will push the naive estimate towards the prior mean. The amount by which the Bayesian estimate will be pushed towards the prior mean will depend on the size of the weight $w = \frac{a+b}{a+b+Tot}$. If this weight is close to 1, then the Bayesian estimate will be very close to the prior mean. On the other hand, if this weight is close to 0, the Bayesian estimate will remain close to the naive estimate.
- The size of $Tot$ (total number of observations or, in the specific product quality estimation example, the total nunber of reviews) relative to $a+b$ controls the size of the weight $w$. If $Tot$ is large relative to $a+b$, then the weight is close to 0.
Kidney Cancer Dataset¶
Now we shall study a real dataset where naive proportions can be uninformative or even misleading. This is an example on "Bayesian Disease Mapping", and is taken from Section 2.7 on estimation of kidney cancer rates of the book "Bayesian Data Analysis Edition 3" by Gelman et al. Some of the code is taken from a blog post by Robin Ryder (google this).
#Read the kidney cancer dataset
import pandas as pd
d_full = pd.read_csv("KidneyCancerClean.csv", skiprows=4)
display(d_full.head(15))
summary = d_full.describe(include = 'all')
display(summary)
Unnamed: 0 | state | Location | fips | dc | dcV | pop | popV | aadc | aadcV | ... | good | dc.2 | dcV.2 | pop.2 | popV.2 | aadc.2 | aadcV.2 | dcC.2 | dcCV.2 | good.2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 108 | ALABAMA | Autauga County, Alabama | 1001 | 2 | VALID | 61921 | VALID | 3.3 | UNRELIABLE | ... | 1 | 1 | VALID | 64915 | VALID | 1.7 | UNRELIABLE | 1.5 | UNRELIABLE | 1 |
1 | 117 | ALABAMA | Baldwin County, Alabama | 1003 | 7 | VALID | 170945 | VALID | 3.5 | UNRELIABLE | ... | 1 | 15 | VALID | 195253 | VALID | 6.3 | UNRELIABLE | 7.7 | UNRELIABLE | 1 |
2 | 129 | ALABAMA | Barbour County, Alabama | 1005 | 0 | VALID | 33316 | VALID | 0.0 | UNRELIABLE | ... | 1 | 1 | VALID | 33987 | VALID | 2.6 | UNRELIABLE | 2.9 | UNRELIABLE | 1 |
3 | 199 | ALABAMA | Bibb County, Alabama | 1007 | 0 | VALID | 30152 | VALID | 0.0 | UNRELIABLE | ... | 1 | 1 | VALID | 31175 | VALID | 2.9 | UNRELIABLE | 3.2 | UNRELIABLE | 1 |
4 | 219 | ALABAMA | Blount County, Alabama | 1009 | 3 | VALID | 88342 | VALID | 3.2 | UNRELIABLE | ... | 1 | 5 | VALID | 91547 | VALID | 4.8 | UNRELIABLE | 5.5 | UNRELIABLE | 1 |
5 | 306 | ALABAMA | Bullock County, Alabama | 1011 | 0 | VALID | 8313 | VALID | 0.0 | UNRELIABLE | ... | 1 | 0 | VALID | 8197 | VALID | 0.0 | UNRELIABLE | 0.0 | UNRELIABLE | 1 |
6 | 318 | ALABAMA | Butler County, Alabama | 1013 | 0 | VALID | 31963 | VALID | 0.0 | UNRELIABLE | ... | 1 | 1 | VALID | 31722 | VALID | 2.8 | UNRELIABLE | 3.2 | UNRELIABLE | 1 |
7 | 343 | ALABAMA | Calhoun County, Alabama | 1015 | 9 | VALID | 243105 | VALID | 4.0 | UNRELIABLE | ... | 1 | 12 | VALID | 233021 | VALID | 5.4 | UNRELIABLE | 5.1 | UNRELIABLE | 1 |
8 | 439 | ALABAMA | Chambers County, Alabama | 1017 | 7 | VALID | 59985 | VALID | 8.3 | UNRELIABLE | ... | 1 | 0 | VALID | 57813 | VALID | 0.0 | UNRELIABLE | 0.0 | UNRELIABLE | 1 |
9 | 467 | ALABAMA | Cherokee County, Alabama | 1019 | 0 | VALID | 43401 | VALID | 0.0 | UNRELIABLE | ... | 1 | 0 | VALID | 43828 | VALID | 0.0 | UNRELIABLE | 0.0 | UNRELIABLE | 1 |
10 | 490 | ALABAMA | Chilton County, Alabama | 1021 | 2 | VALID | 65792 | VALID | 3.1 | UNRELIABLE | ... | 1 | 3 | VALID | 68837 | VALID | 3.8 | UNRELIABLE | 4.4 | UNRELIABLE | 1 |
11 | 496 | ALABAMA | Choctaw County, Alabama | 1023 | 1 | VALID | 22741 | VALID | 3.7 | UNRELIABLE | ... | 1 | 3 | VALID | 22441 | VALID | 11.4 | UNRELIABLE | 13.4 | UNRELIABLE | 1 |
12 | 527 | ALABAMA | Clarke County, Alabama | 1025 | 1 | VALID | 38611 | VALID | 2.5 | UNRELIABLE | ... | 1 | 3 | VALID | 37772 | VALID | 6.7 | UNRELIABLE | 7.9 | UNRELIABLE | 1 |
13 | 533 | ALABAMA | Clay County, Alabama | 1027 | 3 | VALID | 27846 | VALID | 8.1 | UNRELIABLE | ... | 1 | 3 | VALID | 26954 | VALID | 8.9 | UNRELIABLE | 11.1 | UNRELIABLE | 1 |
14 | 557 | ALABAMA | Cleburne County, Alabama | 1029 | 2 | VALID | 29718 | VALID | 5.9 | UNRELIABLE | ... | 1 | 1 | VALID | 29735 | VALID | 3.1 | UNRELIABLE | 3.4 | UNRELIABLE | 1 |
15 rows × 22 columns
Unnamed: 0 | state | Location | fips | dc | dcV | pop | popV | aadc | aadcV | ... | good | dc.2 | dcV.2 | pop.2 | popV.2 | aadc.2 | aadcV.2 | dcC.2 | dcCV.2 | good.2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 3110.000000 | 3110 | 3110 | 3110.000000 | 3110.000000 | 3110 | 3.110000e+03 | 3110 | 3110.000000 | 3110 | ... | 3110.000000 | 3110.000000 | 3110 | 3.110000e+03 | 3110 | 3110.000000 | 3110 | 3109.000000 | 3109 | 3110.000000 |
unique | NaN | 50 | 3109 | NaN | NaN | 1 | NaN | 1 | NaN | 2 | ... | NaN | NaN | 1 | NaN | 1 | NaN | 3 | NaN | 2 | NaN |
top | NaN | TEXAS | Bedford County, Virginia | NaN | NaN | VALID | NaN | VALID | NaN | UNRELIABLE | ... | NaN | NaN | VALID | NaN | VALID | NaN | UNRELIABLE | NaN | UNRELIABLE | NaN |
freq | NaN | 254 | 2 | NaN | NaN | 3110 | NaN | 3110 | NaN | 2882 | ... | NaN | NaN | 3110 | NaN | 3110 | NaN | 2848 | NaN | 2848 | NaN |
mean | 1555.500000 | NaN | NaN | 30672.690675 | 7.527974 | NaN | 1.550087e+05 | NaN | 4.438392 | NaN | ... | 0.985852 | 8.359164 | NaN | 1.606611e+05 | NaN | 4.834244 | NaN | 5.787359 | NaN | 0.985852 |
std | 897.923995 | NaN | NaN | 14980.587745 | 23.095402 | NaN | 4.722345e+05 | NaN | 3.501390 | NaN | ... | 0.118120 | 23.765728 | NaN | 4.972750e+05 | NaN | 3.709540 | NaN | 5.030156 | NaN | 0.118120 |
min | 1.000000 | NaN | NaN | 1001.000000 | 0.000000 | NaN | 2.580000e+02 | NaN | 0.000000 | NaN | ... | 0.000000 | 0.000000 | NaN | 2.670000e+02 | NaN | 0.000000 | NaN | 0.000000 | NaN | 0.000000 |
25% | 778.250000 | NaN | NaN | 19041.500000 | 1.000000 | NaN | 2.238250e+04 | NaN | 2.100000 | NaN | ... | 1.000000 | 1.000000 | NaN | 2.213000e+04 | NaN | 2.525000 | NaN | 2.800000 | NaN | 1.000000 |
50% | 1555.500000 | NaN | NaN | 29210.000000 | 2.000000 | NaN | 4.762800e+04 | NaN | 4.300000 | NaN | ... | 1.000000 | 3.000000 | NaN | 4.806950e+04 | NaN | 4.600000 | NaN | 5.200000 | NaN | 1.000000 |
75% | 2332.750000 | NaN | NaN | 46008.500000 | 6.000000 | NaN | 1.146262e+05 | NaN | 6.100000 | NaN | ... | 1.000000 | 7.000000 | NaN | 1.179355e+05 | NaN | 6.600000 | NaN | 7.700000 | NaN | 1.000000 |
max | 3110.000000 | NaN | NaN | 56045.000000 | 647.000000 | NaN | 1.526640e+07 | NaN | 49.800000 | NaN | ... | 1.000000 | 598.000000 | NaN | 1.660789e+07 | NaN | 44.600000 | NaN | 71.800000 | NaN | 1.000000 |
11 rows × 22 columns
This is county-level data on population and deaths due to kidney cancer. There are quite a few columns in the dataset but we shall only work with the four columns: dc, dc.2, pop and pop.2:
- The columns dc and dc.2 represent death counts due to kidney cancer in 1980-84 and 1985-89 respectively.
- The columns pop and pop.2 represent some measure of the average population in 1980-84 and 1985-89 respectively.
Let us create a smaller dataframe with only these columns.
d = d_full[['state', 'Location', 'fips', 'dc', 'pop', 'dc.2', 'pop.2']].copy()
display(d.head(20))
state | Location | fips | dc | pop | dc.2 | pop.2 | |
---|---|---|---|---|---|---|---|
0 | ALABAMA | Autauga County, Alabama | 1001 | 2 | 61921 | 1 | 64915 |
1 | ALABAMA | Baldwin County, Alabama | 1003 | 7 | 170945 | 15 | 195253 |
2 | ALABAMA | Barbour County, Alabama | 1005 | 0 | 33316 | 1 | 33987 |
3 | ALABAMA | Bibb County, Alabama | 1007 | 0 | 30152 | 1 | 31175 |
4 | ALABAMA | Blount County, Alabama | 1009 | 3 | 88342 | 5 | 91547 |
5 | ALABAMA | Bullock County, Alabama | 1011 | 0 | 8313 | 0 | 8197 |
6 | ALABAMA | Butler County, Alabama | 1013 | 0 | 31963 | 1 | 31722 |
7 | ALABAMA | Calhoun County, Alabama | 1015 | 9 | 243105 | 12 | 233021 |
8 | ALABAMA | Chambers County, Alabama | 1017 | 7 | 59985 | 0 | 57813 |
9 | ALABAMA | Cherokee County, Alabama | 1019 | 0 | 43401 | 0 | 43828 |
10 | ALABAMA | Chilton County, Alabama | 1021 | 2 | 65792 | 3 | 68837 |
11 | ALABAMA | Choctaw County, Alabama | 1023 | 1 | 22741 | 3 | 22441 |
12 | ALABAMA | Clarke County, Alabama | 1025 | 1 | 38611 | 3 | 37772 |
13 | ALABAMA | Clay County, Alabama | 1027 | 3 | 27846 | 3 | 26954 |
14 | ALABAMA | Cleburne County, Alabama | 1029 | 2 | 29718 | 1 | 29735 |
15 | ALABAMA | Coffee County, Alabama | 1031 | 3 | 80334 | 4 | 81882 |
16 | ALABAMA | Colbert County, Alabama | 1033 | 3 | 108797 | 6 | 104971 |
17 | ALABAMA | Conecuh County, Alabama | 1035 | 2 | 22009 | 2 | 20687 |
18 | ALABAMA | Coosa County, Alabama | 1037 | 2 | 18003 | 1 | 17828 |
19 | ALABAMA | Covington County, Alabama | 1039 | 4 | 76229 | 3 | 75990 |
The purpose of this data analysis is to estimate the kidney cancer death rates for each county and, more importantly, to flag the counties with high kidney cancer death rates. Before proceeding to the analysis, let us first combine the death counts for the two periods 80-84 and 85-89 to get one overall death count for each county. Further, let us take the average of pop and pop.2 to get a measure of the average population of each county in the period 80-89.
# Combine the death and population counts for the two periods 80-84 and 85-89
d['dct'] = d['dc'] + d['dc.2'] #dct stands for death count total
d['popm'] = (d['pop'] + d['pop.2']) / 2
display(d.head(15))
state | Location | fips | dc | pop | dc.2 | pop.2 | dct | popm | |
---|---|---|---|---|---|---|---|---|---|
0 | ALABAMA | Autauga County, Alabama | 1001 | 2 | 61921 | 1 | 64915 | 3 | 63418.0 |
1 | ALABAMA | Baldwin County, Alabama | 1003 | 7 | 170945 | 15 | 195253 | 22 | 183099.0 |
2 | ALABAMA | Barbour County, Alabama | 1005 | 0 | 33316 | 1 | 33987 | 1 | 33651.5 |
3 | ALABAMA | Bibb County, Alabama | 1007 | 0 | 30152 | 1 | 31175 | 1 | 30663.5 |
4 | ALABAMA | Blount County, Alabama | 1009 | 3 | 88342 | 5 | 91547 | 8 | 89944.5 |
5 | ALABAMA | Bullock County, Alabama | 1011 | 0 | 8313 | 0 | 8197 | 0 | 8255.0 |
6 | ALABAMA | Butler County, Alabama | 1013 | 0 | 31963 | 1 | 31722 | 1 | 31842.5 |
7 | ALABAMA | Calhoun County, Alabama | 1015 | 9 | 243105 | 12 | 233021 | 21 | 238063.0 |
8 | ALABAMA | Chambers County, Alabama | 1017 | 7 | 59985 | 0 | 57813 | 7 | 58899.0 |
9 | ALABAMA | Cherokee County, Alabama | 1019 | 0 | 43401 | 0 | 43828 | 0 | 43614.5 |
10 | ALABAMA | Chilton County, Alabama | 1021 | 2 | 65792 | 3 | 68837 | 5 | 67314.5 |
11 | ALABAMA | Choctaw County, Alabama | 1023 | 1 | 22741 | 3 | 22441 | 4 | 22591.0 |
12 | ALABAMA | Clarke County, Alabama | 1025 | 1 | 38611 | 3 | 37772 | 4 | 38191.5 |
13 | ALABAMA | Clay County, Alabama | 1027 | 3 | 27846 | 3 | 26954 | 6 | 27400.0 |
14 | ALABAMA | Cleburne County, Alabama | 1029 | 2 | 29718 | 1 | 29735 | 3 | 29726.5 |
A simple way to estimate the kidney cancer rate of each county is to use the proportion: \begin{align*} \text{Naive Estimate of Kidney Cancer Death Rate} = \frac{\text{Number of Kidney Cancer Deaths}}{\text{Population}} \end{align*} We can then rank all the counties according to their naive proportions and flag the counties for whom the naive proportion is high.
d['naiveproportion'] = d['dct'] / d['popm']
summary = d['naiveproportion'].describe()
print(summary)
#Histogram of naive proportion estimates:
import matplotlib.pyplot as plt
plt.hist(d['naiveproportion'], bins=300)
plt.xlabel('naiveproportion')
plt.ylabel('Frequency')
plt.title('Histogram of Naive Proportions')
plt.grid(True)
plt.show()
#Some observations on the histogram:
#There is a spike at zero (corresponding to counties with no kidney cancer deaths)
#Most proportions are around 0.0001 but there are a few counties which have large proportions (more than 0.0004)
count 3110.000000 mean 0.000108 std 0.000068 min 0.000000 25% 0.000067 50% 0.000100 75% 0.000139 max 0.000691 Name: naiveproportion, dtype: float64
#Flagging Counties with High Kidney Cancer Death Proportions.
#One way to do this is to get the counties with the 100 highest naive proportion:
threshold = d['naiveproportion'].nlargest(100).iloc[-1]
print(threshold)
d['cancerhigh'] = d['naiveproportion'] >= threshold
with pd.option_context('display.max_rows', None):
display(d[d['cancerhigh'] >= True])
0.00025143527636927463
state | Location | fips | dc | pop | dc.2 | pop.2 | dct | popm | naiveproportion | cancerhigh | |
---|---|---|---|---|---|---|---|---|---|---|---|
65 | ALABAMA | Wilcox County, Alabama | 1131 | 2 | 11144 | 1 | 10727 | 3 | 10935.5 | 0.000274 | True |
148 | ARKANSAS | Sharp County, Arkansas | 5135 | 4 | 34922 | 5 | 33806 | 9 | 34364.0 | 0.000262 | True |
149 | ARKANSAS | Stone County, Arkansas | 5137 | 1 | 22387 | 5 | 23148 | 6 | 22767.5 | 0.000264 | True |
230 | COLORADO | Dolores County, Colorado | 8033 | 1 | 4151 | 1 | 3797 | 2 | 3974.0 | 0.000503 | True |
473 | GEORGIA | Quitman County, Georgia | 13239 | 0 | 2524 | 1 | 2557 | 1 | 2540.5 | 0.000394 | True |
485 | GEORGIA | Talbot County, Georgia | 13263 | 1 | 5855 | 1 | 6008 | 2 | 5931.5 | 0.000337 | True |
495 | GEORGIA | Treutlen County, Georgia | 13283 | 1 | 9944 | 3 | 9695 | 4 | 9819.5 | 0.000407 | True |
512 | GEORGIA | Wilkes County, Georgia | 13317 | 3 | 14334 | 1 | 14223 | 4 | 14278.5 | 0.000280 | True |
535 | IDAHO | Clark County, Idaho | 16033 | 1 | 2212 | 0 | 2176 | 1 | 2194.0 | 0.000456 | True |
549 | IDAHO | Lewis County, Idaho | 16061 | 2 | 9705 | 1 | 8634 | 3 | 9169.5 | 0.000327 | True |
570 | ILLINOIS | Carroll County, Illinois | 17015 | 5 | 44858 | 7 | 41446 | 12 | 43152.0 | 0.000278 | True |
575 | ILLINOIS | Clay County, Illinois | 17025 | 8 | 37117 | 5 | 35052 | 13 | 36084.5 | 0.000360 | True |
595 | ILLINOIS | Hamilton County, Illinois | 17065 | 2 | 21935 | 4 | 20838 | 6 | 21386.5 | 0.000281 | True |
793 | IOWA | Greene County, Iowa | 19073 | 4 | 28283 | 4 | 25160 | 8 | 26721.5 | 0.000299 | True |
836 | IOWA | Ringgold County, Iowa | 19159 | 2 | 14228 | 3 | 13013 | 5 | 13620.5 | 0.000367 | True |
839 | IOWA | Shelby County, Iowa | 19165 | 4 | 36486 | 5 | 33239 | 9 | 34862.5 | 0.000258 | True |
844 | IOWA | Union County, Iowa | 19175 | 7 | 32300 | 2 | 30043 | 9 | 31171.5 | 0.000289 | True |
854 | IOWA | Worth County, Iowa | 19195 | 1 | 21765 | 5 | 20094 | 6 | 20929.5 | 0.000287 | True |
868 | KANSAS | Clark County, Kansas | 20025 | 2 | 6192 | 0 | 5877 | 2 | 6034.5 | 0.000331 | True |
882 | KANSAS | Ellsworth County, Kansas | 20053 | 2 | 15935 | 2 | 15846 | 4 | 15890.5 | 0.000252 | True |
892 | KANSAS | Greenwood County, Kansas | 20073 | 3 | 20883 | 3 | 19326 | 6 | 20104.5 | 0.000298 | True |
935 | KANSAS | Rice County, Kansas | 20159 | 5 | 27283 | 5 | 25668 | 10 | 26475.5 | 0.000378 | True |
951 | KANSAS | Sumner County, Kansas | 20191 | 5 | 60141 | 12 | 60620 | 17 | 60380.5 | 0.000282 | True |
959 | KANSAS | Woodson County, Kansas | 20207 | 0 | 11148 | 3 | 10227 | 3 | 10687.5 | 0.000281 | True |
1025 | KENTUCKY | Lee County, Kentucky | 21129 | 2 | 18961 | 3 | 18278 | 5 | 18619.5 | 0.000269 | True |
1071 | KENTUCKY | Trigg County, Kentucky | 21221 | 1 | 20577 | 5 | 21924 | 6 | 21250.5 | 0.000282 | True |
1199 | MICHIGAN | Alcona County, Michigan | 26001 | 2 | 23953 | 5 | 24695 | 7 | 24324.0 | 0.000288 | True |
1246 | MICHIGAN | Luce County, Michigan | 26095 | 2 | 14850 | 3 | 13410 | 5 | 14130.0 | 0.000354 | True |
1258 | MICHIGAN | Montmorency County, Michigan | 26119 | 2 | 18711 | 3 | 20070 | 5 | 19390.5 | 0.000258 | True |
1270 | MICHIGAN | Roscommon County, Michigan | 26143 | 5 | 41957 | 6 | 44959 | 11 | 43458.0 | 0.000253 | True |
1316 | MINNESOTA | Kittson County, Minnesota | 27069 | 1 | 16438 | 3 | 14973 | 4 | 15705.5 | 0.000255 | True |
1372 | MISSISSIPPI | Attala County, Mississippi | 28007 | 5 | 27969 | 2 | 27225 | 7 | 27597.0 | 0.000254 | True |
1376 | MISSISSIPPI | Carroll County, Mississippi | 28015 | 1 | 12952 | 3 | 13422 | 4 | 13187.0 | 0.000303 | True |
1379 | MISSISSIPPI | Claiborne County, Mississippi | 28021 | 0 | 7405 | 2 | 5844 | 2 | 6624.5 | 0.000302 | True |
1400 | MISSISSIPPI | Jefferson County, Mississippi | 28063 | 0 | 3927 | 2 | 3305 | 2 | 3616.0 | 0.000553 | True |
1403 | MISSISSIPPI | Kemper County, Mississippi | 28069 | 2 | 11047 | 1 | 10990 | 3 | 11018.5 | 0.000272 | True |
1417 | MISSISSIPPI | Montgomery County, Mississippi | 28097 | 0 | 18294 | 6 | 17060 | 6 | 17677.0 | 0.000339 | True |
1428 | MISSISSIPPI | Quitman County, Mississippi | 28119 | 1 | 12630 | 2 | 11163 | 3 | 11896.5 | 0.000252 | True |
1440 | MISSISSIPPI | Tunica County, Mississippi | 28143 | 0 | 5776 | 2 | 5006 | 2 | 5391.0 | 0.000371 | True |
1457 | MISSOURI | Bates County, Missouri | 29013 | 9 | 37094 | 2 | 36524 | 11 | 36809.0 | 0.000299 | True |
1487 | MISSOURI | Gasconade County, Missouri | 29073 | 6 | 32412 | 7 | 33035 | 13 | 32723.5 | 0.000397 | True |
1512 | MISSOURI | Maries County, Missouri | 29125 | 4 | 19261 | 1 | 19702 | 5 | 19481.5 | 0.000257 | True |
1519 | MISSOURI | Monroe County, Missouri | 29137 | 2 | 22560 | 4 | 21524 | 6 | 22042.0 | 0.000272 | True |
1527 | MISSOURI | Ozark County, Missouri | 29153 | 1 | 20050 | 5 | 20674 | 6 | 20362.0 | 0.000295 | True |
1553 | MISSOURI | Shelby County, Missouri | 29205 | 1 | 18180 | 5 | 16697 | 6 | 17438.5 | 0.000344 | True |
1564 | MISSOURI | Worth County, Missouri | 29227 | 1 | 7016 | 1 | 6153 | 2 | 6584.5 | 0.000304 | True |
1575 | MONTANA | Daniels County, Montana | 30019 | 1 | 6864 | 1 | 6013 | 2 | 6438.5 | 0.000311 | True |
1583 | MONTANA | Glacier County, Montana | 30035 | 2 | 14163 | 3 | 13601 | 5 | 13882.0 | 0.000360 | True |
1585 | MONTANA | Granite County, Montana | 30039 | 0 | 6891 | 2 | 6791 | 2 | 6841.0 | 0.000292 | True |
1594 | MONTANA | McCone County, Montana | 30055 | 1 | 7136 | 1 | 6258 | 2 | 6697.0 | 0.000299 | True |
1608 | MONTANA | Roosevelt County, Montana | 30085 | 2 | 16259 | 3 | 14967 | 5 | 15613.0 | 0.000320 | True |
1620 | MONTANA | Wibaux County, Montana | 30109 | 0 | 3897 | 1 | 3216 | 1 | 3556.5 | 0.000281 | True |
1630 | NEBRASKA | Brown County, Nebraska | 31017 | 1 | 10514 | 2 | 9364 | 3 | 9939.0 | 0.000302 | True |
1635 | NEBRASKA | Cedar County, Nebraska | 31027 | 4 | 27890 | 4 | 25872 | 8 | 26881.0 | 0.000298 | True |
1651 | NEBRASKA | Fillmore County, Nebraska | 31059 | 2 | 18928 | 3 | 17637 | 5 | 18282.5 | 0.000273 | True |
1652 | NEBRASKA | Franklin County, Nebraska | 31061 | 2 | 10530 | 2 | 9843 | 4 | 10186.5 | 0.000393 | True |
1681 | NEBRASKA | McPherson County, Nebraska | 31117 | 0 | 1504 | 1 | 1392 | 1 | 1448.0 | 0.000691 | True |
1721 | NEVADA | Eureka County, Nevada | 32011 | 1 | 3360 | 0 | 3508 | 1 | 3434.0 | 0.000291 | True |
1893 | NORTH.CAROLINA | Gates County, North Carolina | 37073 | 0 | 10873 | 3 | 12298 | 3 | 11585.5 | 0.000259 | True |
1949 | NORTH.CAROLINA | Warren County, North Carolina | 37185 | 3 | 14587 | 2 | 15707 | 5 | 15147.0 | 0.000330 | True |
1962 | NORTH.DAKOTA | Bowman County, North Dakota | 38011 | 1 | 10659 | 3 | 9734 | 4 | 10196.5 | 0.000392 | True |
1970 | NORTH.DAKOTA | Eddy County, North Dakota | 38027 | 0 | 8605 | 3 | 7780 | 3 | 8192.5 | 0.000366 | True |
1994 | NORTH.DAKOTA | Renville County, North Dakota | 38075 | 1 | 9224 | 2 | 8504 | 3 | 8864.0 | 0.000338 | True |
2008 | NORTH.DAKOTA | Wells County, North Dakota | 38103 | 2 | 17064 | 3 | 15698 | 5 | 16381.0 | 0.000305 | True |
2114 | OKLAHOMA | Cotton County, Oklahoma | 40033 | 2 | 15669 | 2 | 14707 | 4 | 15188.0 | 0.000263 | True |
2119 | OKLAHOMA | Dewey County, Oklahoma | 40043 | 5 | 14299 | 1 | 13397 | 6 | 13848.0 | 0.000433 | True |
2128 | OKLAHOMA | Haskell County, Oklahoma | 40061 | 5 | 24288 | 1 | 23418 | 6 | 23853.0 | 0.000252 | True |
2135 | OKLAHOMA | Kiowa County, Oklahoma | 40075 | 6 | 27698 | 3 | 24807 | 9 | 26252.5 | 0.000343 | True |
2146 | OKLAHOMA | McIntosh County, Oklahoma | 40091 | 5 | 31374 | 3 | 31831 | 8 | 31602.5 | 0.000253 | True |
2185 | OREGON | Gilliam County, Oregon | 41021 | 1 | 4928 | 1 | 4273 | 2 | 4600.5 | 0.000435 | True |
2209 | OREGON | Wheeler County, Oregon | 41069 | 0 | 3700 | 1 | 3466 | 1 | 3583.0 | 0.000279 | True |
2283 | SOUTH.CAROLINA | Abbeville County, South Carolina | 45001 | 5 | 37296 | 5 | 38019 | 10 | 37657.5 | 0.000266 | True |
2357 | SOUTH.DAKOTA | Hand County, South Dakota | 46059 | 2 | 11996 | 1 | 11167 | 3 | 11581.5 | 0.000259 | True |
2358 | SOUTH.DAKOTA | Hanson County, South Dakota | 46061 | 2 | 8416 | 1 | 7828 | 3 | 8122.0 | 0.000369 | True |
2362 | SOUTH.DAKOTA | Hyde County, South Dakota | 46069 | 1 | 4584 | 1 | 4101 | 2 | 4342.5 | 0.000461 | True |
2365 | SOUTH.DAKOTA | Jones County, South Dakota | 46075 | 1 | 3632 | 0 | 3626 | 1 | 3629.0 | 0.000276 | True |
2372 | SOUTH.DAKOTA | McCook County, South Dakota | 46087 | 3 | 15383 | 2 | 14292 | 5 | 14837.5 | 0.000337 | True |
2382 | SOUTH.DAKOTA | Potter County, South Dakota | 46107 | 2 | 8964 | 2 | 8407 | 4 | 8685.5 | 0.000461 | True |
2389 | SOUTH.DAKOTA | Tripp County, South Dakota | 46123 | 3 | 16481 | 3 | 16100 | 6 | 16290.5 | 0.000368 | True |
2442 | TENNESSEE | Lake County, Tennessee | 47095 | 2 | 15750 | 2 | 14613 | 4 | 15181.5 | 0.000263 | True |
2497 | TEXAS | Austin County, Texas | 48015 | 8 | 40725 | 4 | 42896 | 12 | 41810.5 | 0.000287 | True |
2540 | TEXAS | Cottle County, Texas | 48101 | 2 | 6337 | 0 | 5473 | 2 | 5905.0 | 0.000339 | True |
2552 | TEXAS | Dickens County, Texas | 48125 | 2 | 7618 | 0 | 6507 | 2 | 7062.5 | 0.000283 | True |
2578 | TEXAS | Gonzales County, Texas | 48177 | 5 | 39364 | 8 | 39825 | 13 | 39594.5 | 0.000328 | True |
2621 | TEXAS | Kent County, Texas | 48263 | 0 | 2857 | 1 | 2606 | 1 | 2731.5 | 0.000366 | True |
2627 | TEXAS | Knox County, Texas | 48275 | 3 | 12332 | 0 | 11531 | 3 | 11931.5 | 0.000251 | True |
2639 | TEXAS | Llano County, Texas | 48299 | 3 | 25352 | 4 | 28130 | 7 | 26741.0 | 0.000262 | True |
2646 | TEXAS | Mason County, Texas | 48319 | 2 | 8609 | 1 | 8443 | 3 | 8526.0 | 0.000352 | True |
2656 | TEXAS | Mills County, Texas | 48333 | 2 | 10962 | 2 | 11028 | 4 | 10995.0 | 0.000364 | True |
2692 | TEXAS | San Augustine County, Texas | 48405 | 2 | 15026 | 2 | 14385 | 4 | 14705.5 | 0.000272 | True |
2695 | TEXAS | San Saba County, Texas | 48411 | 4 | 14253 | 0 | 13744 | 4 | 13998.5 | 0.000286 | True |
2698 | TEXAS | Shackelford County, Texas | 48417 | 1 | 10038 | 2 | 9248 | 3 | 9643.0 | 0.000311 | True |
2728 | TEXAS | Washington County, Texas | 48477 | 5 | 45594 | 7 | 49721 | 12 | 47657.5 | 0.000252 | True |
2802 | VIRGINIA | Brunswick County, Virginia | 51025 | 3 | 16201 | 3 | 16430 | 6 | 16315.5 | 0.000368 | True |
2816 | VIRGINIA | Covington City, Virginia | 51580 | 0 | 16971 | 5 | 14720 | 5 | 15845.5 | 0.000316 | True |
2824 | VIRGINIA | Essex County, Virginia | 51057 | 3 | 12481 | 1 | 12625 | 4 | 12553.0 | 0.000319 | True |
2848 | VIRGINIA | Highland County, Virginia | 51091 | 1 | 7157 | 2 | 6608 | 3 | 6882.5 | 0.000436 | True |
2968 | WEST.VIRGINIA | Doddridge County, West Virginia | 54017 | 2 | 18806 | 3 | 17969 | 5 | 18387.5 | 0.000272 | True |
3033 | WISCONSIN | Florence County, Wisconsin | 55037 | 2 | 10572 | 2 | 11035 | 4 | 10803.5 | 0.000370 | True |
3084 | WISCONSIN | Waushara County, Wisconsin | 55137 | 5 | 45859 | 8 | 46906 | 13 | 46382.5 | 0.000280 | True |
#It will be cool to plot the high cancer counties (in terms of the naive proportions) on the US map
#This can be done in the following way.
import plotly.express as px
#First convert fips to standard form:
d['modfips'] = d['fips'].apply(lambda x: str(x).zfill(5))
#Plotting the counties with high cancer proportion:
fig = px.choropleth(
d,
geojson="https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json", # This is a dataset containing county geometries
locations='modfips', # Use the 'fips' column for county identification
color='cancerhigh',
hover_name='Location', # Display the 'Location' column on hover
scope="usa",
title="Top 100 kidney cancer death rate counties (in terms of naive proportions)",
color_continuous_scale="YlOrRd"
)
fig.update_geos(fitbounds="locations")
fig.update_layout(margin={"r":0,"t":40,"l":0,"b":0})
fig.show()
Question: Is this a good way of flagging the high kidney cancer rate counties? Should we now look into specific reasons (such as dietary patterns, pollution etc, lack of access to quality medical care etc.) why these counties have high rates?
It turns out that this analysis is misleading because of the use of naive proportions as estimates of kidney cancer death rates. In particular, the counties that we flagged are mostly counties with low populations. This can be understood as follows. The typical rate of kidney cancer is about 0.0001 as can be seen, for example, from the naive proportion estimates of counties with large populations.
sorted_populations = d['popm'].sort_values()
print(list(sorted_populations))
#Select counties with large populations
#By large population, let us take populations more than 300000
largecounties = d[d['popm'] >= 300000]
largecounties.shape[0]
#So there are 323 counties with population more than 300000
#Let us evaluate the naive proportions for these counties.
summary = d[d['popm'] >= 300000]['naiveproportion'].describe()
display(summary)
#the mean and median are about 0.0001 which can therefore be viewed as a typical value of the kidney cancer rate
[262.5, 978.0, 1017.0, 1139.0, 1233.5, 1268.0, 1297.0, 1448.0, 1570.5, 1594.0, 1640.5, 1862.5, 1879.0, 1942.0, 1946.5, 2058.5, 2108.0, 2160.5, 2171.0, 2175.5, 2194.0, 2250.0, 2296.0, 2337.0, 2342.5, 2353.0, 2358.5, 2473.5, 2522.5, 2540.5, 2542.5, 2547.5, 2570.5, 2731.0, 2731.5, 2817.5, 2866.5, 3005.5, 3027.0, 3095.0, 3102.5, 3268.5, 3294.5, 3305.0, 3433.0, 3434.0, 3532.5, 3556.5, 3583.0, 3596.5, 3616.0, 3629.0, 3819.0, 3974.0, 4044.5, 4076.0, 4129.0, 4182.0, 4188.5, 4224.0, 4341.5, 4342.5, 4349.0, 4351.0, 4448.5, 4449.5, 4450.0, 4465.0, 4487.0, 4493.5, 4518.5, 4527.5, 4571.5, 4600.5, 4632.5, 4695.5, 4715.5, 4784.5, 4832.0, 4836.5, 4952.0, 4956.5, 5169.0, 5177.5, 5236.0, 5256.5, 5268.0, 5282.0, 5284.5, 5309.5, 5311.5, 5343.5, 5359.0, 5391.0, 5404.5, 5407.5, 5424.5, 5435.5, 5478.0, 5529.5, 5594.0, 5596.5, 5602.5, 5619.0, 5712.0, 5730.0, 5743.0, 5786.0, 5789.5, 5808.0, 5849.0, 5853.5, 5899.0, 5902.0, 5905.0, 5916.0, 5925.5, 5931.5, 6012.5, 6034.5, 6097.0, 6111.0, 6115.0, 6236.0, 6315.0, 6428.5, 6438.5, 6441.0, 6441.5, 6489.5, 6511.5, 6511.5, 6584.5, 6624.5, 6626.0, 6677.0, 6697.0, 6771.5, 6814.5, 6836.0, 6841.0, 6882.5, 7045.0, 7062.5, 7145.0, 7145.5, 7168.0, 7216.0, 7257.0, 7294.5, 7440.0, 7446.5, 7496.0, 7643.0, 7649.5, 7678.5, 7696.5, 7712.0, 7749.5, 7775.5, 7807.5, 7942.0, 7969.5, 8009.0, 8066.5, 8122.0, 8130.0, 8135.0, 8170.0, 8192.5, 8218.5, 8230.5, 8231.0, 8255.0, 8289.5, 8310.0, 8343.0, 8347.5, 8354.0, 8368.0, 8378.0, 8397.0, 8412.5, 8441.0, 8444.0, 8523.5, 8526.0, 8554.5, 8566.0, 8576.0, 8609.0, 8642.0, 8650.5, 8679.5, 8681.5, 8685.5, 8691.5, 8719.0, 8739.0, 8755.5, 8864.0, 8880.0, 8933.5, 8945.5, 8962.5, 9004.5, 9058.5, 9081.5, 9127.5, 9159.0, 9169.5, 9192.5, 9241.5, 9247.0, 9248.5, 9266.5, 9276.5, 9373.0, 9390.5, 9418.0, 9450.5, 9512.5, 9536.5, 9595.5, 9597.5, 9615.0, 9643.0, 9675.0, 9706.5, 9784.5, 9793.0, 9819.5, 9827.5, 9839.0, 9858.5, 9897.0, 9898.0, 9915.5, 9939.0, 9940.0, 9950.0, 9962.0, 10014.0, 10064.5, 10117.0, 10126.0, 10186.5, 10195.0, 10196.5, 10255.0, 10258.0, 10260.5, 10339.0, 10377.0, 10394.0, 10402.0, 10427.5, 10464.0, 10503.0, 10513.0, 10554.0, 10584.5, 10619.5, 10677.0, 10686.0, 10687.5, 10699.5, 10702.5, 10703.0, 10760.0, 10777.0, 10796.0, 10803.5, 10849.0, 10868.0, 10880.0, 10892.5, 10910.0, 10935.5, 10937.5, 10955.5, 10995.0, 11018.5, 11028.0, 11035.5, 11043.0, 11101.5, 11114.5, 11117.5, 11135.5, 11150.5, 11202.5, 11206.5, 11220.0, 11250.0, 11305.0, 11311.5, 11316.5, 11330.0, 11336.0, 11387.0, 11405.0, 11473.0, 11498.0, 11561.0, 11581.5, 11582.5, 11585.5, 11596.5, 11653.5, 11661.0, 11675.0, 11713.0, 11729.5, 11731.0, 11745.5, 11751.5, 11758.5, 11758.5, 11853.5, 11888.0, 11896.5, 11930.5, 11930.5, 11931.5, 11952.0, 11982.0, 12008.5, 12024.0, 12077.0, 12081.0, 12102.5, 12158.5, 12183.0, 12195.5, 12205.0, 12227.5, 12253.5, 12294.0, 12333.0, 12341.0, 12348.0, 12351.0, 12354.0, 12370.5, 12392.5, 12417.0, 12438.5, 12454.0, 12470.0, 12472.5, 12483.5, 12529.0, 12553.0, 12561.5, 12576.5, 12579.0, 12595.5, 12629.5, 12654.0, 12681.5, 12732.5, 12747.0, 12759.5, 12791.0, 12797.0, 12799.5, 12862.0, 12864.0, 12866.0, 12897.5, 12936.5, 12941.0, 12944.0, 12965.0, 13030.0, 13036.0, 13071.5, 13109.5, 13116.0, 13118.0, 13174.0, 13177.0, 13187.0, 13188.0, 13208.5, 13219.0, 13294.5, 13311.5, 13313.0, 13349.0, 13386.0, 13388.5, 13389.5, 13442.0, 13451.0, 13481.5, 13486.5, 13492.0, 13498.5, 13511.5, 13519.5, 13526.0, 13575.5, 13592.0, 13620.5, 13639.0, 13669.5, 13677.5, 13688.0, 13695.5, 13698.0, 13773.5, 13773.5, 13817.0, 13818.0, 13832.5, 13848.0, 13882.0, 13950.0, 13975.5, 13988.0, 13998.5, 14083.5, 14129.0, 14130.0, 14132.0, 14198.5, 14211.0, 14214.5, 14238.0, 14278.5, 14290.0, 14312.0, 14313.0, 14324.0, 14346.0, 14383.5, 14384.5, 14574.0, 14580.0, 14601.0, 14608.0, 14628.0, 14638.0, 14649.5, 14653.0, 14684.5, 14705.5, 14746.0, 14754.5, 14774.5, 14837.5, 14864.0, 14901.5, 14942.5, 14962.5, 14978.5, 14983.0, 15025.5, 15122.5, 15147.0, 15167.5, 15177.5, 15181.5, 15188.0, 15194.5, 15241.0, 15243.5, 15277.0, 15309.5, 15317.5, 15321.0, 15388.5, 15390.5, 15396.0, 15433.5, 15457.0, 15518.0, 15521.0, 15548.5, 15566.0, 15582.0, 15589.5, 15613.0, 15637.0, 15645.0, 15676.5, 15685.5, 15705.5, 15709.5, 15727.0, 15734.0, 15845.5, 15854.5, 15855.5, 15880.0, 15883.5, 15888.0, 15890.5, 16094.5, 16121.5, 16163.5, 16170.0, 16245.0, 16272.5, 16290.5, 16299.5, 16305.5, 16315.5, 16361.0, 16381.0, 16396.0, 16418.5, 16448.5, 16457.0, 16458.0, 16473.0, 16527.0, 16542.0, 16554.5, 16565.5, 16576.0, 16596.0, 16611.5, 16612.5, 16640.5, 16663.0, 16666.5, 16688.0, 16716.5, 16743.0, 16773.0, 16777.5, 16785.5, 16801.5, 16836.0, 16850.0, 16854.5, 16881.0, 16892.5, 16899.5, 16918.5, 16920.5, 16940.5, 17000.5, 17014.5, 17057.0, 17077.5, 17086.0, 17095.0, 17098.5, 17099.0, 17147.0, 17155.5, 17196.0, 17241.0, 17295.0, 17327.0, 17369.5, 17433.5, 17435.5, 17438.5, 17482.5, 17492.5, 17508.0, 17551.0, 17649.0, 17660.0, 17677.0, 17690.0, 17704.5, 17721.5, 17736.0, 17765.5, 17794.0, 17795.5, 17801.0, 17812.0, 17872.0, 17886.5, 17892.0, 17896.0, 17915.5, 17943.5, 17970.5, 18028.5, 18051.5, 18076.0, 18080.0, 18092.0, 18102.5, 18121.5, 18148.0, 18181.5, 18217.5, 18279.0, 18282.5, 18291.0, 18361.5, 18362.0, 18387.5, 18387.5, 18392.0, 18444.0, 18452.0, 18475.0, 18511.5, 18536.0, 18537.5, 18550.5, 18556.5, 18583.5, 18610.0, 18619.5, 18630.5, 18662.0, 18687.0, 18695.0, 18699.5, 18756.5, 18759.5, 18829.5, 18841.5, 18853.5, 18875.5, 18884.0, 18909.0, 18934.0, 18941.5, 18950.5, 19027.5, 19041.5, 19052.5, 19082.5, 19097.5, 19109.0, 19134.0, 19193.0, 19196.0, 19243.0, 19247.5, 19290.0, 19319.5, 19360.5, 19390.5, 19392.5, 19433.5, 19438.5, 19454.0, 19468.0, 19481.5, 19489.0, 19509.5, 19535.0, 19558.0, 19558.0, 19559.5, 19582.5, 19610.0, 19644.0, 19691.5, 19711.0, 19769.0, 19777.0, 19812.0, 19823.5, 19833.0, 19890.0, 19944.0, 19947.5, 19957.5, 19958.5, 20013.0, 20026.0, 20082.5, 20087.5, 20089.0, 20099.0, 20104.5, 20108.5, 20113.5, 20124.0, 20126.5, 20145.5, 20178.5, 20185.5, 20194.5, 20268.0, 20270.0, 20280.0, 20314.0, 20327.5, 20329.0, 20362.0, 20425.0, 20452.5, 20485.5, 20547.5, 20644.0, 20648.5, 20649.5, 20674.5, 20686.5, 20687.5, 20691.0, 20709.5, 20722.5, 20780.0, 20798.5, 20807.5, 20823.0, 20839.5, 20906.0, 20925.0, 20929.5, 20934.5, 20942.5, 20958.0, 20968.5, 20982.0, 21002.0, 21055.0, 21066.0, 21068.5, 21091.0, 21111.0, 21135.5, 21139.0, 21151.5, 21175.5, 21177.5, 21180.0, 21198.5, 21227.0, 21234.5, 21250.5, 21272.5, 21297.5, 21300.5, 21313.0, 21347.0, 21348.0, 21362.5, 21383.5, 21385.5, 21386.5, 21387.5, 21391.0, 21434.5, 21460.5, 21488.0, 21493.0, 21565.0, 21603.5, 21671.0, 21682.0, 21688.5, 21715.5, 21784.5, 21797.0, 21820.0, 21822.5, 21839.0, 21880.0, 21920.5, 21922.5, 21931.5, 21936.0, 21958.5, 22003.5, 22037.5, 22042.0, 22194.5, 22198.5, 22227.0, 22230.5, 22233.5, 22236.5, 22257.0, 22285.5, 22292.5, 22294.5, 22321.5, 22326.0, 22381.0, 22416.0, 22425.0, 22427.5, 22473.0, 22479.0, 22552.5, 22560.0, 22567.0, 22591.0, 22607.5, 22620.5, 22638.5, 22649.5, 22673.5, 22723.5, 22747.0, 22767.5, 22791.5, 22844.5, 22940.0, 23004.0, 23006.0, 23114.0, 23119.5, 23121.5, 23124.5, 23164.0, 23212.5, 23242.0, 23250.0, 23260.0, 23321.5, 23366.5, 23369.5, 23370.0, 23436.0, 23481.5, 23557.0, 23559.0, 23610.0, 23619.5, 23632.0, 23658.0, 23668.5, 23684.5, 23695.5, 23706.0, 23723.0, 23737.5, 23743.5, 23760.0, 23771.5, 23853.0, 23863.5, 23869.5, 23887.5, 23939.0, 23982.5, 23983.0, 24021.0, 24034.5, 24052.5, 24063.5, 24112.0, 24145.5, 24224.0, 24232.5, 24243.5, 24280.5, 24297.0, 24324.0, 24383.5, 24436.5, 24483.5, 24492.0, 24526.0, 24550.0, 24550.5, 24566.0, 24665.5, 24677.5, 24707.0, 24734.5, 24740.5, 24780.5, 24793.0, 24813.5, 24884.0, 24902.5, 24950.5, 24963.0, 25011.0, 25019.5, 25054.0, 25167.5, 25186.0, 25222.0, 25276.5, 25360.0, 25405.5, 25486.0, 25531.0, 25574.5, 25593.0, 25617.5, 25624.5, 25650.0, 25676.0, 25702.5, 25704.5, 25721.5, 25743.0, 25746.5, 25806.0, 25808.0, 25824.0, 25827.5, 25883.5, 25962.5, 26061.5, 26108.5, 26124.5, 26183.0, 26220.0, 26220.0, 26233.5, 26239.0, 26240.0, 26245.5, 26252.5, 26326.0, 26330.0, 26372.0, 26382.0, 26382.0, 26457.5, 26463.0, 26472.5, 26475.5, 26529.5, 26544.0, 26587.0, 26599.0, 26601.0, 26601.5, 26615.5, 26616.5, 26721.5, 26738.5, 26741.0, 26754.0, 26831.0, 26852.5, 26854.5, 26861.5, 26881.0, 26904.5, 26916.5, 26953.0, 26990.5, 26995.5, 27010.5, 27053.0, 27103.0, 27110.0, 27148.5, 27176.0, 27209.0, 27215.5, 27230.0, 27249.0, 27257.0, 27299.5, 27343.5, 27386.0, 27400.0, 27408.5, 27440.5, 27486.0, 27506.5, 27542.0, 27573.0, 27597.0, 27601.5, 27614.5, 27616.0, 27659.0, 27672.0, 27692.5, 27722.0, 27745.5, 27758.5, 27885.5, 27967.5, 27991.5, 28030.5, 28038.0, 28051.0, 28107.0, 28140.0, 28150.5, 28153.5, 28163.5, 28181.0, 28183.0, 28185.0, 28206.5, 28242.5, 28294.0, 28363.0, 28363.5, 28365.0, 28460.5, 28490.0, 28533.0, 28562.5, 28595.0, 28642.0, 28703.5, 28745.0, 28775.0, 28778.0, 28848.5, 28891.5, 28913.5, 29081.5, 29100.5, 29107.0, 29127.0, 29128.5, 29201.0, 29211.5, 29212.5, 29228.0, 29274.5, 29293.0, 29346.0, 29402.5, 29402.5, 29412.0, 29416.0, 29433.5, 29480.5, 29519.0, 29558.0, 29575.0, 29583.0, 29599.5, 29614.5, 29649.0, 29707.5, 29726.5, 29745.5, 29751.0, 29761.0, 29765.0, 29774.5, 29800.5, 29828.5, 29851.5, 29852.5, 29901.0, 29902.0, 29911.0, 29943.0, 29959.0, 30028.5, 30070.5, 30089.0, 30089.5, 30128.0, 30133.0, 30152.5, 30152.5, 30178.5, 30230.0, 30250.0, 30259.5, 30321.5, 30327.5, 30399.5, 30403.0, 30411.5, 30475.0, 30477.5, 30509.0, 30524.5, 30598.0, 30663.5, 30674.0, 30691.0, 30695.5, 30733.0, 30786.0, 30819.5, 30831.0, 30837.5, 30894.0, 30933.5, 31044.0, 31096.0, 31109.5, 31171.5, 31171.5, 31239.0, 31304.0, 31314.5, 31342.5, 31418.5, 31457.0, 31458.0, 31506.5, 31529.5, 31568.5, 31602.5, 31622.5, 31628.5, 31641.5, 31682.5, 31694.0, 31792.5, 31808.5, 31816.5, 31842.5, 31848.0, 31848.5, 31856.0, 31871.5, 31905.5, 31911.0, 31927.0, 31953.5, 32028.5, 32123.5, 32139.0, 32184.5, 32218.5, 32245.0, 32272.0, 32287.0, 32348.5, 32380.0, 32425.5, 32457.5, 32460.5, 32461.0, 32478.0, 32494.5, 32533.0, 32571.0, 32571.0, 32631.5, 32723.5, 32751.0, 32757.5, 32767.5, 32863.5, 32930.5, 32994.0, 33026.0, 33041.5, 33044.5, 33096.5, 33118.5, 33130.0, 33143.0, 33149.0, 33165.0, 33209.5, 33262.0, 33314.0, 33356.0, 33356.5, 33371.5, 33402.5, 33429.0, 33463.0, 33521.0, 33561.5, 33615.0, 33640.5, 33649.5, 33651.5, 33663.5, 33708.0, 33718.0, 33738.5, 33792.0, 33853.0, 33882.0, 33900.0, 33912.0, 33984.5, 33987.0, 34037.0, 34046.5, 34065.0, 34070.5, 34077.0, 34106.5, 34112.5, 34142.0, 34176.5, 34196.5, 34205.0, 34236.5, 34264.5, 34266.0, 34285.5, 34309.0, 34326.5, 34345.0, 34364.0, 34372.5, 34380.5, 34408.0, 34414.5, 34467.5, 34476.5, 34551.5, 34564.0, 34657.0, 34662.0, 34670.0, 34711.0, 34790.0, 34862.5, 34978.0, 34993.5, 35021.5, 35035.5, 35115.5, 35213.5, 35312.5, 35339.0, 35339.5, 35388.5, 35401.5, 35433.5, 35469.5, 35471.5, 35501.5, 35501.5, 35512.5, 35543.0, 35551.5, 35595.0, 35649.5, 35658.0, 35685.5, 35733.0, 35754.0, 35765.5, 35765.5, 35778.0, 35835.0, 35842.5, 35892.5, 35897.0, 35898.5, 35900.0, 35975.5, 35980.0, 36084.5, 36124.5, 36140.0, 36234.5, 36243.5, 36264.0, 36334.0, 36344.5, 36353.5, 36384.5, 36444.0, 36480.5, 36488.0, 36493.0, 36500.0, 36514.0, 36531.5, 36557.5, 36582.5, 36701.5, 36738.5, 36752.5, 36809.0, 36836.0, 36890.5, 36922.0, 36924.0, 36937.5, 36946.5, 36966.0, 36989.5, 37174.5, 37194.0, 37291.5, 37329.5, 37406.5, 37454.5, 37524.5, 37607.5, 37657.5, 37678.0, 37752.0, 37752.0, 37794.5, 37801.5, 37806.0, 37831.5, 37838.5, 37845.0, 37858.5, 37883.5, 37900.5, 37940.0, 38007.5, 38055.5, 38097.0, 38136.0, 38162.0, 38167.5, 38185.5, 38191.5, 38264.0, 38273.5, 38348.0, 38376.0, 38469.5, 38494.0, 38527.0, 38574.0, 38614.5, 38622.0, 38648.0, 38731.0, 38760.5, 38768.5, 38769.0, 38811.5, 38822.0, 38826.5, 38846.5, 38869.0, 38887.5, 38949.0, 39017.0, 39017.5, 39032.5, 39043.5, 39134.5, 39209.0, 39256.0, 39282.0, 39290.0, 39338.5, 39351.0, 39369.0, 39424.0, 39443.5, 39472.5, 39515.5, 39530.0, 39545.0, 39550.0, 39554.0, 39594.5, 39604.5, 39637.5, 39649.0, 39677.0, 39717.0, 39732.5, 39742.0, 39770.5, 39773.0, 39783.0, 39799.5, 39849.0, 39881.0, 39885.0, 39893.0, 39951.5, 39974.0, 40014.0, 40131.5, 40146.0, 40160.0, 40223.0, 40248.5, 40314.0, 40330.5, 40539.5, 40542.5, 40569.5, 40570.0, 40587.0, 40601.5, 40724.0, 40805.0, 40830.0, 40912.0, 40975.0, 40994.0, 41002.5, 41025.5, 41042.0, 41149.5, 41188.0, 41219.0, 41231.5, 41274.5, 41276.5, 41384.0, 41389.5, 41415.0, 41444.5, 41505.5, 41548.5, 41572.0, 41686.5, 41793.5, 41810.5, 41859.5, 41872.0, 41898.0, 41951.5, 42009.0, 42023.5, 42025.5, 42112.5, 42133.5, 42149.0, 42191.0, 42195.5, 42202.5, 42264.5, 42266.0, 42270.0, 42295.0, 42350.5, 42373.0, 42440.5, 42445.5, 42468.0, 42773.5, 42806.5, 42848.5, 42891.0, 42926.5, 43041.0, 43091.5, 43098.0, 43107.0, 43116.5, 43130.0, 43133.5, 43152.0, 43178.0, 43192.0, 43266.5, 43291.5, 43312.5, 43437.0, 43458.0, 43514.5, 43551.5, 43569.5, 43570.5, 43601.0, 43611.5, 43614.5, 43637.0, 43711.0, 43741.0, 43798.5, 43809.5, 43827.5, 43889.5, 43902.0, 43915.0, 43935.5, 43942.5, 44025.5, 44045.0, 44065.5, 44100.0, 44126.5, 44200.0, 44249.5, 44254.5, 44295.5, 44321.0, 44353.0, 44371.0, 44381.5, 44455.0, 44483.0, 44590.5, 44645.5, 44689.5, 44692.5, 44693.5, 44754.0, 44784.0, 44787.0, 44870.5, 44885.0, 44948.5, 44975.0, 45025.0, 45066.0, 45100.5, 45155.0, 45198.0, 45200.5, 45227.0, 45268.5, 45277.5, 45277.5, 45282.5, 45290.5, 45348.0, 45372.5, 45377.5, 45384.0, 45418.5, 45446.0, 45625.5, 45668.5, 45794.5, 45794.5, 45808.0, 45867.0, 45899.0, 45947.0, 46069.5, 46204.0, 46353.5, 46381.0, 46382.5, 46401.0, 46422.0, 46509.0, 46533.0, 46561.5, 46591.0, 46644.5, 46689.0, 46702.5, 46854.5, 46972.5, 46983.5, 47029.5, 47065.5, 47087.0, 47331.0, 47346.0, 47352.5, 47389.0, 47491.0, 47574.0, 47657.5, 47660.5, 47750.0, 47779.0, 47840.0, 47850.5, 47904.0, 47930.0, 47955.0, 47999.0, 48069.0, 48110.5, 48117.0, 48170.0, 48174.0, 48179.0, 48202.5, 48263.0, 48300.5, 48319.5, 48371.0, 48372.5, 48543.5, 48558.5, 48695.5, 48800.5, 48885.5, 48951.5, 49014.5, 49058.5, 49080.0, 49091.0, 49117.0, 49276.0, 49299.5, 49337.5, 49453.0, 49478.0, 49559.5, 49577.0, 49603.5, 49624.5, 49725.5, 49747.5, 49785.0, 49793.0, 49878.5, 49880.0, 49896.5, 49961.0, 50180.0, 50180.5, 50202.5, 50226.0, 50266.5, 50307.0, 50388.0, 50395.5, 50475.0, 50503.0, 50548.5, 50571.5, 50674.5, 50727.0, 50756.5, 50790.5, 50796.0, 50940.0, 50957.5, 50984.5, 51033.5, 51110.5, 51155.0, 51158.0, 51192.5, 51235.0, 51271.5, 51276.0, 51337.0, 51365.0, 51379.0, 51589.0, 51592.5, 51699.0, 51737.5, 51745.5, 51753.0, 51772.0, 51783.5, 51848.0, 51928.5, 51976.0, 51985.5, 51997.5, 52092.5, 52112.5, 52357.5, 52413.0, 52515.0, 52598.5, 52607.5, 52642.0, 52675.5, 52719.0, 52731.5, 52774.0, 52810.5, 52862.0, 52894.5, 52923.0, 52942.0, 52970.0, 53068.5, 53125.0, 53128.0, 53283.0, 53341.0, 53500.5, 53500.5, 53779.0, 53933.5, 54007.5, 54090.0, 54091.0, 54093.0, 54115.5, 54120.5, 54128.5, 54163.0, 54180.5, 54282.5, 54328.0, 54442.5, 54468.0, 54540.0, 54688.0, 54721.5, 54750.5, 54798.0, 54875.0, 54903.5, 55088.5, 55162.5, 55309.5, 55439.5, 55492.5, 55585.0, 55837.0, 56061.5, 56146.0, 56153.0, 56210.0, 56252.5, 56324.0, 56375.0, 56529.5, 56567.5, 56632.0, 56673.5, 56708.0, 56721.0, 56725.5, 56760.5, 56923.0, 56965.0, 56966.0, 57004.0, 57022.0, 57026.0, 57035.0, 57194.5, 57221.5, 57231.5, 57241.0, 57245.0, 57321.0, 57358.0, 57360.5, 57368.5, 57388.5, 57436.0, 57447.5, 57509.5, 57510.5, 57538.5, 57613.5, 57666.0, 57765.0, 57798.0, 57829.0, 57931.0, 57933.5, 57996.5, 58284.5, 58368.5, 58445.0, 58448.5, 58631.5, 58770.5, 58811.0, 58899.0, 58922.5, 59038.5, 59084.0, 59294.5, 59310.0, 59312.5, 59326.5, 59354.0, 59449.5, 59530.5, 59556.0, 59739.5, 59845.0, 59886.5, 59907.0, 59921.5, 60079.0, 60120.5, 60124.0, 60187.5, 60234.0, 60315.5, 60380.5, 60524.5, 60525.0, 60657.5, 60681.5, 60683.0, 60718.0, 60824.5, 60848.5, 60860.5, 60924.5, 60927.5, 60974.0, 61028.5, 61229.0, 61240.5, 61324.0, 61365.5, 61381.0, 61455.5, 61488.5, 61554.5, 61568.5, 61575.5, 61605.5, 61612.0, 61725.0, 61843.5, 61945.0, 61994.5, 62187.5, 62243.0, 62324.0, 62350.0, 62350.5, 62351.5, 62354.5, 62521.5, 62598.5, 62656.0, 62845.0, 62849.0, 63038.0, 63076.0, 63116.0, 63209.5, 63259.5, 63295.5, 63348.0, 63418.0, 63473.0, 63490.5, 63496.5, 63551.0, 63592.0, 63619.5, 63699.0, 63731.0, 63756.0, 63774.0, 63839.5, 63895.0, 63911.0, 64038.5, 64119.5, 64154.0, 64169.0, 64187.0, 64261.0, 64277.0, 64280.5, 64444.0, 64490.5, 64511.5, 64560.5, 64662.0, 64665.0, 64846.5, 64854.0, 65001.5, 65037.0, 65096.5, 65137.0, 65219.5, 65291.0, 65316.5, 65422.5, 65561.0, 65696.5, 65703.5, 65757.5, 65877.0, 65893.5, 65957.5, 66019.0, 66191.5, 66202.0, 66296.0, 66511.5, 66573.0, 66654.5, 66697.5, 66778.5, 66888.0, 66897.0, 66937.5, 66948.5, 67260.5, 67314.5, 67325.0, 67377.0, 67383.0, 67418.5, 67480.0, 67760.5, 67788.5, 67936.5, 68109.5, 68184.0, 68233.0, 68372.5, 68470.5, 68663.0, 68674.5, 68692.5, 68862.0, 68866.5, 68928.5, 69053.0, 69175.0, 69263.5, 69355.5, 69402.5, 69445.0, 69662.5, 69782.5, 69970.0, 69980.0, 70028.5, 70070.0, 70185.5, 70201.5, 70203.5, 70258.0, 70302.5, 70411.5, 70549.0, 70582.0, 70706.5, 70797.0, 70859.0, 70875.5, 70976.0, 70995.5, 71012.5, 71070.0, 71076.5, 71329.5, 71410.5, 71496.0, 71610.5, 71818.0, 71849.5, 71922.5, 71941.5, 71959.0, 72004.5, 72045.5, 72128.0, 72130.0, 72204.0, 72210.0, 72302.0, 72432.0, 72466.0, 72487.5, 72498.5, 72641.5, 72767.5, 72779.0, 72803.5, 72980.0, 73055.0, 73138.5, 73219.0, 73254.0, 73294.5, 73353.0, 73419.5, 73423.0, 73481.0, 73540.5, 73592.5, 73679.5, 73856.0, 73892.0, 73916.0, 73956.5, 74054.0, 74128.0, 74308.0, 74359.5, 74463.0, 74769.5, 74826.5, 75060.5, 75085.5, 75102.0, 75133.0, 75370.5, 75428.0, 75449.0, 75536.0, 75594.0, 75623.5, 75744.0, 75791.0, 75792.0, 75828.5, 75884.5, 75994.5, 76004.5, 76030.5, 76109.5, 76159.5, 76192.0, 76200.0, 76227.0, 76274.5, 76323.5, 76408.5, 76411.0, 76490.0, 76596.5, 76656.0, 76684.0, 76699.5, 76911.0, 76946.0, 77039.0, 77061.0, 77090.5, 77563.5, 77613.0, 77654.5, 78057.5, 78145.5, 78282.5, 78547.5, 78712.0, 78947.0, 79380.0, 79383.5, 79440.5, 79692.0, 79727.5, 79887.0, 79926.5, 79948.0, 79980.5, 80013.5, 80020.0, 80207.5, 80250.5, 80531.0, 80741.0, 80743.5, 80789.0, 80824.5, 81055.0, 81077.0, 81108.0, 81134.0, 81231.5, 81294.5, 81324.5, 81393.5, 81418.0, 81586.0, 81712.5, 81869.0, 81923.0, 81942.5, 81952.0, 82020.5, 82024.0, 82121.5, 82131.0, 82277.0, 82329.5, 82489.5, 82493.5, 82558.5, 82614.5, 82643.0, 82670.5, 82848.0, 82969.5, 82987.5, 82992.5, 83066.0, 83106.5, 83159.5, 83210.5, 83234.0, 83346.5, 83409.0, 83494.0, 83579.0, 83602.5, 83682.0, 84146.0, 84196.5, 84235.0, 84266.0, 84393.0, 84523.5, 84540.5, 84557.0, 84614.0, 84870.0, 84954.5, 85037.5, 85060.5, 85274.5, 85296.0, 85311.0, 85392.0, 85463.0, 85489.5, 85547.5, 85692.0, 85820.5, 85830.0, 85846.5, 85921.0, 85991.5, 86068.0, 86095.0, 86118.0, 86161.0, 86293.0, 86521.5, 86739.0, 86822.5, 86823.5, 86920.0, 86997.0, 87210.5, 87242.5, 87292.5, 87300.5, 87325.5, 87645.0, 87680.0, 87834.0, 87948.5, 88064.0, 88084.0, 88242.5, 88242.5, 88261.0, 88317.5, 88493.0, 88538.5, 88598.5, 88705.5, 88712.5, 88730.5, 89300.0, 89467.0, 89530.5, 89574.0, 89634.5, 89715.5, 89800.5, 89944.5, 90041.0, 90063.0, 90161.0, 90179.0, 90377.0, 90456.0, 90505.0, 91244.0, 91398.0, 91601.0, 91775.0, 91935.5, 91975.5, 92041.5, 92056.0, 92208.0, 92225.5, 92230.0, 92682.5, 92911.0, 92968.0, 93112.0, 93118.0, 93231.5, 93389.0, 93392.0, 93509.0, 93512.5, 93631.0, 93651.5, 94088.0, 94102.0, 94151.5, 94196.0, 94447.0, 94571.5, 94806.5, 94928.0, 94979.0, 95036.5, 95156.0, 95256.0, 95290.5, 95319.0, 95360.0, 95405.0, 95586.5, 95607.0, 95645.5, 95745.5, 95860.5, 96034.5, 96211.0, 96289.0, 96350.5, 96373.0, 96386.5, 96498.5, 96592.5, 96628.5, 96638.5, 96653.0, 96696.5, 96697.5, 96762.0, 96958.0, 97318.0, 97382.5, 97579.0, 97789.5, 97834.0, 97902.0, 97908.0, 98145.5, 98385.5, 98580.5, 98941.0, 99018.0, 99117.5, 99220.0, 99364.5, 100081.0, 100084.5, 100149.5, 100153.5, 100225.0, 100468.0, 100707.5, 100802.0, 101057.0, 101067.0, 101139.5, 101157.0, 101186.5, 101268.5, 101411.0, 101677.5, 101692.5, 102262.0, 102295.5, 102319.0, 102390.0, 102772.0, 102794.0, 103159.0, 103268.5, 103299.5, 103460.0, 103557.0, 103560.5, 103696.5, 103912.0, 104513.0, 104576.5, 105160.0, 105617.0, 105692.0, 105780.0, 105788.5, 105935.0, 106013.5, 106091.5, 106448.0, 106519.0, 106804.5, 106884.0, 107155.0, 107431.5, 107495.0, 107737.0, 108022.0, 108207.0, 108228.5, 108618.5, 108823.0, 108914.0, 109213.0, 109460.0, 109649.0, 109725.5, 109844.5, 109874.5, 109930.5, 110073.0, 110290.0, 110733.0, 110738.0, 110814.0, 110889.0, 111250.0, 111407.0, 111660.0, 111766.5, 111806.5, 111808.0, 112118.5, 112184.5, 112431.5, 112620.0, 112830.0, 112834.0, 113034.5, 113040.0, 113191.0, 113205.5, 113464.5, 113506.0, 113947.0, 113971.0, 113982.5, 114548.0, 114563.0, 114584.5, 114595.5, 114675.5, 114737.0, 114946.0, 114965.0, 115089.0, 115102.0, 115196.0, 115217.0, 115457.5, 115568.5, 115684.5, 115914.0, 116003.5, 116515.5, 116524.5, 116600.5, 116886.0, 116977.0, 117292.0, 117338.0, 117762.0, 117805.0, 117931.5, 118052.5, 118356.0, 118598.0, 118762.0, 119012.5, 119259.5, 119996.5, 120336.5, 120757.0, 120814.5, 120976.0, 120983.0, 121185.5, 121194.0, 121458.0, 121560.0, 121583.5, 121736.0, 122036.0, 122240.5, 122408.5, 122485.0, 122885.5, 123198.0, 123211.0, 123347.5, 123848.5, 124211.0, 124638.5, 125085.5, 125202.0, 125549.0, 125604.0, 125629.0, 125799.5, 125951.0, 125952.5, 126309.5, 126335.0, 126408.5, 126622.0, 126804.0, 126963.0, 126988.5, 127268.0, 127385.0, 127425.0, 127499.0, 127521.5, 128061.0, 128261.5, 128479.0, 128723.0, 129193.5, 129290.5, 129355.5, 129908.0, 130210.0, 130265.5, 130420.0, 130484.5, 130793.5, 130908.0, 130982.5, 131352.0, 131375.5, 131437.5, 131508.5, 131549.5, 131636.5, 132171.0, 132178.5, 132754.5, 132965.0, 133985.5, 134233.5, 134646.5, 135291.0, 135357.0, 135912.0, 135928.5, 136499.5, 136543.5, 136815.5, 136844.5, 136861.5, 137446.0, 137653.0, 137968.0, 138062.0, 138120.0, 138128.0, 138142.5, 138236.5, 138271.5, 138313.5, 138336.0, 138429.0, 138558.0, 138824.5, 138886.5, 140079.0, 140140.5, 140276.5, 140298.0, 140427.0, 140602.5, 141914.0, 142038.5, 142116.5, 142815.0, 142952.0, 143011.0, 143017.5, 143115.0, 143392.0, 143485.0, 144097.0, 144119.0, 144127.0, 144583.5, 144673.0, 144818.5, 145303.5, 145777.5, 146004.5, 146269.5, 146557.0, 146600.0, 146811.5, 147051.5, 147748.5, 147855.0, 148389.5, 148510.5, 148588.5, 149456.0, 150066.0, 150324.0, 150431.5, 150554.0, 151215.5, 151500.5, 151580.0, 151742.5, 151908.5, 152026.5, 152304.5, 152310.0, 152888.5, 152975.5, 153357.0, 154165.5, 154210.5, 154573.0, 154890.0, 154927.0, 155051.5, 155130.0, 155520.0, 155629.0, 155781.0, 155910.5, 156072.0, 156360.0, 156489.0, 156509.5, 156744.5, 156773.5, 156811.0, 157274.0, 157596.0, 157983.5, 158473.5, 158659.0, 158782.0, 159032.5, 159727.0, 159995.0, 160086.5, 160278.5, 160406.0, 160638.0, 161185.0, 161668.5, 161948.5, 161953.5, 162684.5, 162694.5, 163034.0, 163121.0, 164662.5, 164725.5, 165328.5, 165408.0, 165499.5, 166433.5, 167029.5, 167109.0, 167296.5, 167901.5, 167992.0, 168542.5, 168649.5, 168700.0, 169112.0, 169203.0, 169593.5, 169689.0, 169830.0, 170345.5, 170908.0, 171901.0, 172141.0, 172572.0, 172977.0, 173222.5, 173313.5, 173675.5, 173682.5, 173923.5, 174031.5, 174118.0, 174338.5, 174813.0, 174988.0, 175103.5, 175452.5, 176947.0, 177165.5, 177776.5, 177819.5, 178012.5, 179152.0, 180016.5, 181055.0, 181222.5, 181505.5, 182181.5, 182345.5, 182527.5, 183099.0, 183980.0, 184034.0, 184907.0, 185288.5, 185602.5, 185762.5, 185782.5, 185860.0, 186112.5, 188219.0, 188436.5, 188673.0, 189166.0, 189696.0, 189905.0, 190403.0, 190403.0, 190706.5, 191400.5, 192392.0, 192426.0, 193566.0, 194054.5, 194517.5, 194529.0, 194707.0, 194776.5, 195396.5, 195533.5, 196071.5, 196362.5, 196867.5, 196889.0, 197319.5, 197823.5, 197974.5, 198343.5, 199213.0, 199638.0, 199811.5, 200132.5, 200142.0, 200311.0, 201291.0, 201451.0, 201649.5, 202086.5, 202817.5, 203922.5, 204009.0, 204024.0, 204400.0, 204537.5, 204639.0, 205129.0, 205480.0, 205575.5, 205931.0, 205959.0, 206282.5, 207690.5, 208536.5, 209183.0, 209678.0, 209987.5, 210473.5, 210612.0, 210855.5, 210996.0, 211045.5, 211232.5, 212768.5, 213373.0, 213657.5, 213678.0, 213850.0, 214100.0, 214482.0, 214763.0, 215507.5, 215790.0, 215823.0, 216197.0, 216369.5, 216891.5, 217102.0, 217956.5, 218044.5, 218540.5, 218618.5, 219324.0, 219823.5, 220797.0, 221283.0, 222142.0, 224746.0, 224857.0, 225738.0, 225898.0, 227024.5, 227250.0, 227484.0, 227509.5, 227785.0, 228194.5, 228262.5, 231395.5, 231788.5, 232615.5, 232691.5, 232882.5, 234506.0, 235062.5, 235083.5, 235651.5, 235964.0, 236590.5, 236713.0, 237761.5, 238007.0, 238063.0, 238485.5, 238800.0, 238867.0, 239380.0, 240102.0, 240887.0, 241366.0, 241855.5, 241927.5, 242120.5, 243795.5, 243930.0, 244429.5, 244742.5, 246679.5, 252211.0, 253248.0, 254128.5, 255115.0, 255154.5, 256884.5, 257446.5, 258049.0, 259297.5, 260413.0, 261734.0, 262364.0, 262586.0, 264111.5, 264833.0, 265213.5, 265306.0, 265315.5, 265586.0, 266216.5, 266565.5, 269666.5, 269848.5, 271002.0, 271524.5, 271975.5, 272367.5, 273081.0, 273114.0, 273371.0, 273694.5, 274477.0, 274687.5, 275259.5, 275450.5, 275805.0, 276472.5, 277085.5, 277203.5, 278264.0, 279299.5, 282358.5, 282715.0, 285946.0, 286664.5, 287324.0, 287460.0, 287507.0, 288081.0, 288407.0, 291015.0, 291338.0, 292154.0, 292644.5, 295113.0, 295411.0, 296209.5, 297081.5, 297384.5, 297706.5, 298518.0, 298635.5, 298836.0, 299845.0, 302485.5, 302975.0, 303137.5, 306182.0, 306739.5, 307292.0, 310066.0, 312021.0, 312470.0, 313348.0, 315064.5, 315453.5, 318163.5, 319584.0, 320155.5, 320647.5, 320863.5, 321353.5, 322216.5, 322798.5, 323096.5, 325197.0, 326633.5, 327897.0, 328805.5, 329165.0, 329894.5, 330234.0, 330666.5, 331396.0, 331443.5, 331781.5, 332071.5, 334023.0, 334791.5, 335443.0, 335822.0, 335980.5, 336321.5, 338180.5, 338850.0, 339124.5, 341451.0, 342457.0, 345553.5, 352019.0, 355382.5, 355567.0, 356419.0, 356608.5, 357634.0, 357980.5, 359045.0, 359494.0, 360255.0, 361588.0, 363717.0, 364200.5, 365117.0, 368412.0, 368593.0, 369478.5, 370172.0, 370303.0, 372674.5, 373428.0, 377082.0, 377796.5, 379264.0, 379361.5, 381615.5, 382188.5, 383697.0, 387789.5, 388732.0, 391038.0, 391219.0, 391734.0, 392149.0, 395894.5, 397605.5, 402769.0, 404682.5, 405897.5, 406835.5, 407653.5, 408173.5, 412533.5, 414750.0, 415945.5, 416192.0, 419601.5, 429515.0, 432218.5, 433599.0, 434050.0, 434690.5, 439973.5, 440892.5, 449592.5, 450007.5, 451024.0, 455340.0, 455649.5, 455731.5, 458367.5, 463245.0, 464700.5, 466569.5, 471290.5, 471903.5, 475332.0, 482390.5, 483389.5, 486309.5, 489399.5, 489661.0, 491551.0, 492648.0, 493716.0, 494139.5, 494661.0, 496041.5, 496053.0, 506986.5, 509464.0, 509685.0, 511291.0, 516047.0, 516896.5, 519268.5, 520458.0, 523040.5, 523799.5, 524449.5, 526831.5, 529653.0, 532686.5, 536959.5, 538905.5, 547029.0, 547283.5, 547663.0, 548484.5, 553199.5, 561284.5, 561795.0, 563872.0, 566661.0, 572744.0, 574110.5, 575038.5, 580411.0, 583282.0, 583549.5, 587944.5, 589699.5, 596123.0, 600769.5, 603609.0, 606725.0, 611865.0, 616511.5, 619920.5, 620226.5, 621086.5, 624050.5, 632111.5, 635957.0, 637258.0, 637533.5, 638621.5, 644182.5, 648209.5, 660511.0, 662589.0, 668351.5, 671075.5, 687235.0, 691622.0, 693248.5, 697132.5, 699442.0, 709646.5, 710319.0, 730158.0, 731247.5, 736126.0, 738736.5, 742569.0, 746129.5, 754323.5, 754886.0, 761062.5, 763822.0, 769661.0, 771391.0, 777628.0, 782960.5, 783817.5, 786206.5, 791353.0, 796658.5, 803439.5, 804494.5, 816277.5, 819141.5, 822248.0, 828930.5, 829901.0, 830321.5, 846649.5, 850666.0, 860304.5, 866784.5, 877342.5, 896740.0, 897953.0, 898073.0, 904580.0, 905033.0, 905465.5, 913927.0, 923289.0, 949088.5, 960919.5, 967893.0, 970311.0, 971369.0, 973803.0, 990416.0, 1011849.0, 1017333.5, 1023812.0, 1024256.0, 1025093.5, 1028073.5, 1036202.0, 1048524.0, 1084504.0, 1088207.5, 1091301.0, 1092363.5, 1096649.5, 1122398.5, 1124051.0, 1126563.5, 1130932.0, 1141381.0, 1143823.5, 1147248.0, 1159200.5, 1164822.5, 1179012.0, 1185355.5, 1186137.0, 1199373.0, 1213850.0, 1226447.0, 1227795.0, 1241730.5, 1250344.5, 1271099.5, 1308830.5, 1335205.5, 1354857.0, 1367452.0, 1384484.0, 1393260.5, 1407600.5, 1444082.0, 1456089.5, 1463575.0, 1466029.0, 1487542.5, 1496226.0, 1509531.0, 1550382.5, 1557642.0, 1620713.5, 1621954.5, 1647187.5, 1651719.5, 1656036.0, 1671846.0, 1679297.0, 1743127.0, 1771679.0, 1777221.0, 1796861.0, 1818715.5, 1834863.5, 1840023.5, 1868414.0, 2030366.0, 2034570.0, 2062262.0, 2165584.5, 2177527.0, 2282103.5, 2296950.0, 2303886.5, 2372053.5, 2391551.5, 2456863.0, 2608833.0, 2860748.0, 2904168.0, 2910685.5, 2938108.0, 2943333.0, 3154520.5, 3201162.0, 3220943.0, 3272794.0, 3325314.5, 3372201.5, 4128670.5, 4676037.0, 4858142.0, 5239794.0, 8991108.5, 15937146.5]
count 323.000000 mean 0.000098 std 0.000029 min 0.000034 25% 0.000081 50% 0.000095 75% 0.000115 max 0.000227 Name: naiveproportion, dtype: float64
Now we can explain why our initial flagged set of high cancer rate counties is dominated by counties with low populations. Consider a county with a small population of 1000. In the given ten-year period, suppose this county has 1 kidney cancer death. The naive proportion would then be 1/1000 = 0.001 which is already 10 times the typical rate of 0.0001. So this county would immediately be flagged as a high cancer rate county just because of the single death. Go back to our flagged list of high population counties and check the population sizes.
d[d['naiveproportion'] >= threshold]['popm'].describe()
#The county with the largest population in this list is 60000 but there are counties with population as small as 1448.
count 100.000000 mean 16940.950000 std 11922.792393 min 1448.000000 25% 8442.625000 50% 14204.250000 75% 21550.375000 max 60380.500000 Name: popm, dtype: float64
Bayesian estimates of the kidney cancer death rates¶
For Bayesian estimates, we need an appropriate prior. One way of obtaining such a prior is to consider the naive proportions of the large population counties (say, with population at least 300000) and to fit a Beta density to the naive proportions. This can be done via mean and variance as follows.
proportions_largecounties = d['naiveproportion'][d['popm'] >= 300000] #filter out the high population counties
mean_largecounties = np.mean(proportions_largecounties)
var_largecounties = np.var(proportions_largecounties)
display(mean_largecounties, var_largecounties)
9.813238615946435e-05
8.172661645662484e-10
For what values of $a$ and $b$ would the $\text{Beta}(a, b)$ density have mean $m$ and variance $V$. To answer this, we need to basically solve \begin{align*} \frac{a}{a+b} = m ~~ \text{ and } ~~ \frac{a}{a+b} \frac{b}{a+b} \frac{1}{a+b+1} = V. \end{align*} Plugging $m$ for $\frac{a}{a+b}$ and $1-m$ for $\frac{b}{a+b}$ in the second equation, we obtain \begin{align*} \frac{m(1-m)}{a+b+1} = V \implies a+b = \frac{m(1-m)}{V} - 1. \end{align*} Combining this with $\frac{a}{a+b} = m$, we obtain \begin{align*} a = m\left(\frac{m(1-m)}{V} - 1 \right) ~~~ \text{ and } ~~~ b = (1-m)\left(\frac{m(1-m)}{V} - 1 \right) \end{align*} Of course, $m$ and $V$ should be such that $a$ and $b$ as computed above should be strictly bigger than zero. If not, this method of figuring out $a$ and $b$ using mean and variance does not work.
#Using the above formula, we obtain
m = mean_largecounties
V = var_largecounties
a = ((m*m*(1-m))/V) - m
b = (((1-m)*(1-m)*m)/V) - (1-m)
display(a, b)
11.781889938777496
120049.39668603124
Note that here $a$ is much smaller than $b$ which means that the Beta density will be concentrated on very small values (this makes sense because this is intended to be a prior for the kidney cancer rate which we believe to be small with typical values around 0.0001).
#Plot of this Beta density
x = np.arange(0, 8e-4, 1e-6)
y = beta.pdf(x, a, b)
plt.plot(x, y, label = f"Beta({a}, {b})")
plt.title(f'Beta Density for a = {a} and b = {b}')
plt.xlabel("x")
plt.ylabel("Density")
plt.legend()
plt.show()
We now compute the Bayesian estimates of kidney cancer death rates. For a county with population $POP$ and death count $DC$, our posterior density of the rate would be \begin{align*} \text{Beta}(a + DC, b + POP - DC) \end{align*} Thus the posterior mean (which will be our Bayesian estimate of the kidney cancer death rate) is \begin{align*} \frac{a + DC}{a + b + POP}. \end{align*}
d['bayesestimate'] = (a + d['dct']) / (a + b + d['popm'])
#We can now flag the top 100 counties using the Bayesian estimate:
threshold = d['bayesestimate'].nlargest(100).iloc[-1]
d['bayeshigh'] = d['bayesestimate'] >= threshold
#Let us look at these counties to see if they are different from the previous flagged list:
with pd.option_context('display.max_rows', None):
display(d[d['bayeshigh'] >= True])
state | Location | fips | dc | pop | dc.2 | pop.2 | dct | popm | naiveproportion | cancerhigh | modfips | bayesestimate | bayeshigh | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
61 | ALABAMA | Tallapoosa County, Alabama | 1123 | 8 | 67871 | 7 | 68002 | 15 | 67936.5 | 0.000221 | False | 01123 | 0.000142 | True |
71 | ARIZONA | Graham County, Arizona | 4009 | 7 | 51342 | 6 | 53873 | 13 | 52607.5 | 0.000247 | False | 04009 | 0.000144 | True |
74 | ARIZONA | Mohave County, Arizona | 4015 | 8 | 149437 | 22 | 189941 | 30 | 169689.0 | 0.000177 | False | 04015 | 0.000144 | True |
79 | ARIZONA | Yavapai County, Arizona | 4025 | 14 | 176766 | 21 | 223856 | 35 | 200311.0 | 0.000175 | False | 04025 | 0.000146 | True |
106 | ARKANSAS | Garland County, Arkansas | 5051 | 10 | 154925 | 18 | 158622 | 28 | 156773.5 | 0.000179 | False | 05051 | 0.000144 | True |
148 | ARKANSAS | Sharp County, Arkansas | 5135 | 4 | 34922 | 5 | 33806 | 9 | 34364.0 | 0.000262 | True | 05135 | 0.000135 | True |
172 | CALIFORNIA | Lake County, California | 6033 | 12 | 93785 | 11 | 109600 | 23 | 101692.5 | 0.000226 | False | 06033 | 0.000157 | True |
183 | CALIFORNIA | Napa County, California | 6055 | 13 | 236568 | 29 | 247143 | 42 | 241855.5 | 0.000174 | False | 06055 | 0.000149 | True |
280 | CONNECTICUT | Middlesex County, Connecticut | 9007 | 20 | 304656 | 35 | 320284 | 55 | 312470.0 | 0.000176 | False | 09007 | 0.000154 | True |
294 | FLORIDA | Broward County, Florida | 12011 | 197 | 2212734 | 207 | 2381166 | 404 | 2296950.0 | 0.000176 | False | 12011 | 0.000172 | True |
296 | FLORIDA | Charlotte County, Florida | 12015 | 16 | 154371 | 27 | 209992 | 43 | 182181.5 | 0.000236 | False | 12015 | 0.000181 | True |
297 | FLORIDA | Citrus County, Florida | 12017 | 14 | 146051 | 23 | 191349 | 37 | 168700.0 | 0.000219 | False | 12017 | 0.000169 | True |
299 | FLORIDA | Collier County, Florida | 12021 | 22 | 229457 | 25 | 301715 | 47 | 265586.0 | 0.000177 | False | 12021 | 0.000152 | True |
314 | FLORIDA | Hernando County, Florida | 12053 | 8 | 125721 | 21 | 191597 | 29 | 158659.0 | 0.000183 | False | 12053 | 0.000146 | True |
323 | FLORIDA | Lee County, Florida | 12071 | 36 | 509853 | 62 | 657246 | 98 | 583549.5 | 0.000168 | False | 12071 | 0.000156 | True |
328 | FLORIDA | Manatee County, Florida | 12081 | 31 | 345451 | 27 | 413077 | 58 | 379264.0 | 0.000153 | False | 12081 | 0.000140 | True |
329 | FLORIDA | Marion County, Florida | 12083 | 16 | 280325 | 33 | 361402 | 49 | 320863.5 | 0.000153 | False | 12083 | 0.000138 | True |
330 | FLORIDA | Martin County, Florida | 12085 | 18 | 161947 | 15 | 202744 | 33 | 182345.5 | 0.000181 | False | 12085 | 0.000148 | True |
338 | FLORIDA | Palm Beach County, Florida | 12099 | 111 | 1314930 | 145 | 1617128 | 256 | 1466029.0 | 0.000175 | False | 12099 | 0.000169 | True |
339 | FLORIDA | Pasco County, Florida | 12101 | 58 | 498118 | 54 | 598851 | 112 | 548484.5 | 0.000204 | False | 12101 | 0.000185 | True |
340 | FLORIDA | Pinellas County, Florida | 12103 | 146 | 1612081 | 153 | 1746513 | 299 | 1679297.0 | 0.000178 | False | 12103 | 0.000173 | True |
346 | FLORIDA | Sarasota County, Florida | 12115 | 55 | 485595 | 65 | 573711 | 120 | 529653.0 | 0.000227 | False | 12115 | 0.000203 | True |
352 | FLORIDA | Volusia County, Florida | 12127 | 43 | 596543 | 50 | 724479 | 93 | 660511.0 | 0.000141 | False | 12127 | 0.000134 | True |
568 | ILLINOIS | Bureau County, Illinois | 17011 | 7 | 92928 | 12 | 87198 | 19 | 90063.0 | 0.000211 | False | 17011 | 0.000146 | True |
570 | ILLINOIS | Carroll County, Illinois | 17015 | 5 | 44858 | 7 | 41446 | 12 | 43152.0 | 0.000278 | True | 17015 | 0.000146 | True |
575 | ILLINOIS | Clay County, Illinois | 17025 | 8 | 37117 | 5 | 35052 | 13 | 36084.5 | 0.000360 | True | 17025 | 0.000159 | True |
612 | ILLINOIS | LaSalle County, Illinois | 17099 | 22 | 265008 | 25 | 255818 | 47 | 260413.0 | 0.000180 | False | 17099 | 0.000154 | True |
618 | ILLINOIS | Macoupin County, Illinois | 17117 | 9 | 116685 | 11 | 114684 | 20 | 115684.5 | 0.000173 | False | 17117 | 0.000135 | True |
630 | ILLINOIS | Montgomery County, Illinois | 17135 | 10 | 77006 | 7 | 75055 | 17 | 76030.5 | 0.000224 | False | 17135 | 0.000147 | True |
683 | INDIANA | Dubois County, Indiana | 18037 | 9 | 85189 | 7 | 88456 | 16 | 86822.5 | 0.000184 | False | 18037 | 0.000134 | True |
732 | INDIANA | Randolph County, Indiana | 18135 | 11 | 70673 | 5 | 67051 | 16 | 68862.0 | 0.000232 | False | 18135 | 0.000147 | True |
789 | IOWA | Fayette County, Iowa | 19065 | 6 | 60513 | 7 | 53969 | 13 | 57241.0 | 0.000227 | False | 19065 | 0.000140 | True |
793 | IOWA | Greene County, Iowa | 19073 | 4 | 28283 | 4 | 25160 | 8 | 26721.5 | 0.000299 | True | 19073 | 0.000135 | True |
819 | IOWA | Marion County, Iowa | 19125 | 8 | 72517 | 6 | 72415 | 14 | 72466.0 | 0.000193 | False | 19125 | 0.000134 | True |
833 | IOWA | Polk County, Iowa | 19153 | 45 | 683277 | 52 | 699967 | 97 | 691622.0 | 0.000140 | False | 19153 | 0.000134 | True |
839 | IOWA | Shelby County, Iowa | 19165 | 4 | 36486 | 5 | 33239 | 9 | 34862.5 | 0.000258 | True | 19165 | 0.000134 | True |
844 | IOWA | Union County, Iowa | 19175 | 7 | 32300 | 2 | 30043 | 9 | 31171.5 | 0.000289 | True | 19175 | 0.000137 | True |
933 | KANSAS | Reno County, Kansas | 20155 | 15 | 152347 | 15 | 148516 | 30 | 150431.5 | 0.000199 | False | 20155 | 0.000154 | True |
935 | KANSAS | Rice County, Kansas | 20159 | 5 | 27283 | 5 | 25668 | 10 | 26475.5 | 0.000378 | True | 20159 | 0.000149 | True |
951 | KANSAS | Sumner County, Kansas | 20191 | 5 | 60141 | 12 | 60620 | 17 | 60380.5 | 0.000282 | True | 20191 | 0.000160 | True |
1039 | KENTUCKY | McCracken County, Kentucky | 21145 | 18 | 130840 | 8 | 130747 | 26 | 130793.5 | 0.000199 | False | 21145 | 0.000151 | True |
1081 | LOUISIANA | Acadia Parish, Louisiana | 22001 | 8 | 115970 | 12 | 114422 | 20 | 115196.0 | 0.000174 | False | 22001 | 0.000135 | True |
1115 | LOUISIANA | Natchitoches Parish, Louisiana | 22069 | 7 | 60729 | 6 | 57924 | 13 | 59326.5 | 0.000219 | False | 22069 | 0.000138 | True |
1116 | LOUISIANA | Orleans Parish, Louisiana | 22071 | 46 | 561106 | 41 | 484975 | 87 | 523040.5 | 0.000166 | False | 22071 | 0.000154 | True |
1172 | MARYLAND | Garrett County, Maryland | 24023 | 5 | 65546 | 9 | 65847 | 14 | 65696.5 | 0.000213 | False | 24023 | 0.000139 | True |
1185 | MASSACHUSETTS | Barnstable County, Massachusetts | 25001 | 23 | 354916 | 36 | 403807 | 59 | 379361.5 | 0.000156 | False | 25001 | 0.000142 | True |
1230 | MICHIGAN | Huron County, Michigan | 26063 | 8 | 88924 | 9 | 86973 | 17 | 87948.5 | 0.000193 | False | 26063 | 0.000138 | True |
1249 | MICHIGAN | Manistee County, Michigan | 26101 | 9 | 54118 | 3 | 51671 | 12 | 52894.5 | 0.000227 | False | 26101 | 0.000138 | True |
1263 | MICHIGAN | Ogemaw County, Michigan | 26129 | 3 | 41030 | 7 | 43017 | 10 | 42023.5 | 0.000238 | False | 26129 | 0.000134 | True |
1270 | MICHIGAN | Roscommon County, Michigan | 26143 | 5 | 41957 | 6 | 44959 | 11 | 43458.0 | 0.000253 | True | 26143 | 0.000139 | True |
1295 | MINNESOTA | Clay County, Minnesota | 27027 | 12 | 117417 | 8 | 115614 | 20 | 116515.5 | 0.000172 | False | 27027 | 0.000134 | True |
1312 | MINNESOTA | Itasca County, Minnesota | 27061 | 8 | 103740 | 13 | 98574 | 21 | 101157.0 | 0.000208 | False | 27061 | 0.000148 | True |
1330 | MINNESOTA | Morrison County, Minnesota | 27097 | 11 | 72683 | 6 | 72875 | 17 | 72779.0 | 0.000234 | False | 27097 | 0.000149 | True |
1337 | MINNESOTA | Otter Tail County, Minnesota | 27111 | 11 | 128042 | 14 | 125202 | 25 | 126622.0 | 0.000197 | False | 27111 | 0.000149 | True |
1350 | MINNESOTA | Saint Louis County, Minnesota | 27137 | 36 | 517061 | 37 | 472261 | 73 | 494661.0 | 0.000148 | False | 27137 | 0.000138 | True |
1358 | MINNESOTA | Todd County, Minnesota | 27153 | 5 | 62645 | 8 | 59836 | 13 | 61240.5 | 0.000212 | False | 27153 | 0.000137 | True |
1367 | MINNESOTA | Wright County, Minnesota | 27171 | 12 | 149423 | 13 | 160837 | 25 | 155130.0 | 0.000161 | False | 27171 | 0.000134 | True |
1455 | MISSOURI | Barry County, Missouri | 29009 | 5 | 59669 | 9 | 64018 | 14 | 61843.5 | 0.000226 | False | 29009 | 0.000142 | True |
1457 | MISSOURI | Bates County, Missouri | 29013 | 9 | 37094 | 2 | 36524 | 11 | 36809.0 | 0.000299 | True | 29013 | 0.000145 | True |
1485 | MISSOURI | Dunklin County, Missouri | 29069 | 7 | 78502 | 8 | 72396 | 15 | 75449.0 | 0.000199 | False | 29069 | 0.000137 | True |
1487 | MISSOURI | Gasconade County, Missouri | 29073 | 6 | 32412 | 7 | 33035 | 13 | 32723.5 | 0.000397 | True | 29073 | 0.000162 | True |
1545 | MISSOURI | Saint Louis City, Missouri | 29510 | 43 | 530715 | 31 | 503078 | 74 | 516896.5 | 0.000143 | False | 29510 | 0.000135 | True |
1557 | MISSOURI | Taney County, Missouri | 29213 | 7 | 51901 | 6 | 58424 | 13 | 55162.5 | 0.000236 | False | 29213 | 0.000141 | True |
1635 | NEBRASKA | Cedar County, Nebraska | 31027 | 4 | 27890 | 4 | 25872 | 8 | 26881.0 | 0.000298 | True | 31027 | 0.000135 | True |
1746 | NEW.JERSEY | Cape May County, New Jersey | 34009 | 13 | 190513 | 18 | 206174 | 31 | 198343.5 | 0.000156 | False | 34009 | 0.000134 | True |
1748 | NEW.JERSEY | Essex County, New Jersey | 34013 | 74 | 1186092 | 84 | 1108404 | 158 | 1147248.0 | 0.000138 | False | 34013 | 0.000134 | True |
1756 | NEW.JERSEY | Ocean County, New Jersey | 34029 | 56 | 823481 | 82 | 931204 | 138 | 877342.5 | 0.000157 | False | 34029 | 0.000150 | True |
1810 | NEW.YORK | Essex County, New York | 36031 | 12 | 88129 | 6 | 88356 | 18 | 88242.5 | 0.000204 | False | 36031 | 0.000143 | True |
1901 | NORTH.CAROLINA | Henderson County, North Carolina | 37089 | 11 | 140912 | 15 | 152711 | 26 | 146811.5 | 0.000177 | False | 37089 | 0.000142 | True |
1909 | NORTH.CAROLINA | Lee County, North Carolina | 37105 | 6 | 70067 | 10 | 74908 | 16 | 72487.5 | 0.000221 | False | 37105 | 0.000144 | True |
1937 | NORTH.CAROLINA | Rutherford County, North Carolina | 37161 | 7 | 115562 | 13 | 119022 | 20 | 117292.0 | 0.000171 | False | 37161 | 0.000134 | True |
2123 | OKLAHOMA | Grady County, Oklahoma | 40051 | 10 | 96371 | 7 | 95698 | 17 | 96034.5 | 0.000177 | False | 40051 | 0.000133 | True |
2135 | OKLAHOMA | Kiowa County, Oklahoma | 40075 | 6 | 27698 | 3 | 24807 | 9 | 26252.5 | 0.000343 | True | 40075 | 0.000142 | True |
2144 | OKLAHOMA | McClain County, Oklahoma | 40087 | 7 | 51642 | 6 | 53979 | 13 | 52810.5 | 0.000246 | False | 40087 | 0.000143 | True |
2160 | OKLAHOMA | Pottawatomie County, Oklahoma | 40125 | 11 | 125946 | 10 | 125653 | 21 | 125799.5 | 0.000167 | False | 40125 | 0.000133 | True |
2165 | OKLAHOMA | Sequoyah County, Oklahoma | 40135 | 6 | 62768 | 7 | 64416 | 13 | 63592.0 | 0.000204 | False | 40135 | 0.000135 | True |
2166 | OKLAHOMA | Stephens County, Oklahoma | 40137 | 7 | 104268 | 13 | 97866 | 20 | 101067.0 | 0.000198 | False | 40137 | 0.000144 | True |
2178 | OREGON | Clatsop County, Oregon | 41007 | 6 | 78001 | 9 | 78114 | 15 | 78057.5 | 0.000192 | False | 41007 | 0.000135 | True |
2195 | OREGON | Lincoln County, Oregon | 41041 | 9 | 86243 | 8 | 86079 | 17 | 86161.0 | 0.000197 | False | 41041 | 0.000140 | True |
2221 | PENNSYLVANIA | Cambria County, Pennsylvania | 42021 | 30 | 421206 | 32 | 392465 | 62 | 406835.5 | 0.000152 | False | 42021 | 0.000140 | True |
2226 | PENNSYLVANIA | Clarion County, Pennsylvania | 42031 | 7 | 104518 | 11 | 101026 | 18 | 102772.0 | 0.000175 | False | 42031 | 0.000134 | True |
2238 | PENNSYLVANIA | Franklin County, Pennsylvania | 42055 | 21 | 271832 | 21 | 278687 | 42 | 275259.5 | 0.000153 | False | 42055 | 0.000136 | True |
2245 | PENNSYLVANIA | Lackawanna County, Pennsylvania | 42069 | 37 | 521701 | 50 | 510393 | 87 | 516047.0 | 0.000169 | False | 42069 | 0.000155 | True |
2254 | PENNSYLVANIA | Mifflin County, Pennsylvania | 42087 | 19 | 110161 | 6 | 109290 | 25 | 109725.5 | 0.000228 | False | 42087 | 0.000160 | True |
2258 | PENNSYLVANIA | Northampton County, Pennsylvania | 42095 | 37 | 536374 | 42 | 557684 | 79 | 547029.0 | 0.000144 | False | 42095 | 0.000136 | True |
2264 | PENNSYLVANIA | Schuylkill County, Pennsylvania | 42107 | 30 | 374792 | 25 | 364165 | 55 | 369478.5 | 0.000149 | False | 42107 | 0.000136 | True |
2266 | PENNSYLVANIA | Somerset County, Pennsylvania | 42111 | 15 | 196243 | 15 | 190889 | 30 | 193566.0 | 0.000155 | False | 42111 | 0.000133 | True |
2281 | RHODE.ISLAND | Providence County, Rhode Island | 44007 | 83 | 1265359 | 91 | 1276840 | 174 | 1271099.5 | 0.000137 | False | 44007 | 0.000134 | True |
2283 | SOUTH.CAROLINA | Abbeville County, South Carolina | 45001 | 5 | 37296 | 5 | 38019 | 10 | 37657.5 | 0.000266 | True | 45001 | 0.000138 | True |
2308 | SOUTH.CAROLINA | Horry County, South Carolina | 45051 | 20 | 214099 | 18 | 267675 | 38 | 240887.0 | 0.000158 | False | 45051 | 0.000138 | True |
2434 | TENNESSEE | Henry County, Tennessee | 47079 | 6 | 61818 | 9 | 60913 | 15 | 61365.5 | 0.000244 | False | 47079 | 0.000148 | True |
2447 | TENNESSEE | Loudon County, Tennessee | 47105 | 6 | 70000 | 9 | 72153 | 15 | 71076.5 | 0.000211 | False | 47105 | 0.000140 | True |
2497 | TEXAS | Austin County, Texas | 48015 | 8 | 40725 | 4 | 42896 | 12 | 41810.5 | 0.000287 | True | 48015 | 0.000147 | True |
2578 | TEXAS | Gonzales County, Texas | 48177 | 5 | 39364 | 8 | 39825 | 13 | 39594.5 | 0.000328 | True | 48177 | 0.000155 | True |
2728 | TEXAS | Washington County, Texas | 48477 | 5 | 45594 | 7 | 49721 | 12 | 47657.5 | 0.000252 | True | 48477 | 0.000142 | True |
2739 | TEXAS | Wood County, Texas | 48499 | 10 | 56053 | 3 | 62572 | 13 | 59312.5 | 0.000219 | False | 48499 | 0.000138 | True |
3030 | WISCONSIN | Douglas County, Wisconsin | 55031 | 12 | 105866 | 10 | 98772 | 22 | 102319.0 | 0.000215 | False | 55031 | 0.000152 | True |
3074 | WISCONSIN | Sheboygan County, Wisconsin | 55117 | 16 | 244470 | 22 | 244389 | 38 | 244429.5 | 0.000155 | False | 55117 | 0.000137 | True |
3077 | WISCONSIN | Vernon County, Wisconsin | 55123 | 7 | 64887 | 6 | 64001 | 13 | 64444.0 | 0.000202 | False | 55123 | 0.000134 | True |
3084 | WISCONSIN | Waushara County, Wisconsin | 55137 | 5 | 45859 | 8 | 46906 | 13 | 46382.5 | 0.000280 | True | 55137 | 0.000149 | True |
Clearly now there are many counties with large populations in the list. We can plot these counties now on the US map as before.
#Plotting counties with high Bayesian estimates of kidney cancer rates on the US map
import plotly.express as px
#First convert fips to standard form:
d['modfips'] = d['fips'].apply(lambda x: str(x).zfill(5))
#Plotting the counties with high cancer proportion:
fig = px.choropleth(
d,
geojson="https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json", # This is a dataset containing county geometries
locations='modfips', # Use the 'fips' column for county identification
color='bayeshigh',
hover_name='Location', # Display the 'Location' column on hover
scope="usa",
title="Top 100 kidney cancer death rate counties (in terms of Bayes estimates)",
color_continuous_scale="YlOrRd"
)
fig.update_geos(fitbounds="locations")
fig.update_layout(margin={"r":0,"t":40,"l":0,"b":0})
fig.show()
#Look at the states for the high counties (under both the naive and Bayes estimates)
counts = d.groupby('state').agg({'cancerhigh': 'sum', 'bayeshigh': 'sum'})
print(counts)
cancerhigh bayeshigh state ALABAMA 1 1 ARIZONA 0 3 ARKANSAS 2 2 CALIFORNIA 0 2 COLORADO 1 0 CONNECTICUT 0 1 DELAWARE 0 0 DISTRICT.OF.COLUMBIA 0 0 FLORIDA 0 14 GEORGIA 4 0 HAWAII 0 0 IDAHO 2 0 ILLINOIS 3 6 INDIANA 0 2 IOWA 5 6 KANSAS 6 3 KENTUCKY 2 1 LOUISIANA 0 3 MAINE 0 0 MARYLAND 0 1 MASSACHUSETTS 0 1 MICHIGAN 4 4 MINNESOTA 1 7 MISSISSIPPI 8 0 MISSOURI 7 6 MONTANA 6 0 NEBRASKA 5 1 NEVADA 1 0 NEW.HAMPSHIRE 0 0 NEW.JERSEY 0 3 NEW.MEXICO 0 0 NEW.YORK 0 1 NORTH.CAROLINA 2 3 NORTH.DAKOTA 4 0 OHIO 0 0 OKLAHOMA 5 6 OREGON 2 2 PENNSYLVANIA 0 8 RHODE.ISLAND 0 1 SOUTH.CAROLINA 1 2 SOUTH.DAKOTA 7 0 TENNESSEE 1 2 TEXAS 13 4 UTAH 0 0 VERMONT 0 0 VIRGINIA 4 0 WASHINGTON 0 0 WEST.VIRGINIA 1 0 WISCONSIN 2 4 WYOMING 0 0
Clearly, there are quite many differences between the two estimates.
Let us now see how the Bayes rule converts prior to posterior for some specific counties.
#Consider the county in row 1681 of the dataset:
print(d.iloc[1681])
state NEBRASKA Location McPherson County, Nebraska fips 31117 dc 0 pop 1504 dc.2 1 pop.2 1392 dct 1 popm 1448.0 naiveproportion 0.000691 cancerhigh True modfips 31117 bayesestimate 0.000105 bayeshigh False Name: 1681, dtype: object
#This county has low population (1448) and the naiveproportion is 0.000691 (which is roughly seven times the typical value of 0.0001)
#So this has been flagged as a high risk county by the naive estimates
#Note though that this is not a high risk county according to the Bayesian estimate
x = np.arange(0, 8e-4, 1e-6)
Tot = round(d.loc[1681, 'popm'])
Pos = d.loc[1681, 'dct']
Neg = Tot - Pos
plt.plot(x, beta.pdf(x, a, b), color = 'red', label = 'Prior')
plt.plot(x, beta.pdf(x, a + Pos, b + Neg), label = 'Posterior')
plt.axvline(x = Pos/Tot, color = 'blue') #line at the naive proportion
plt.legend()
plt.show()
#Basically there is very little difference between the prior and the posterior
#In other words, the Bayesian posterior is largely ignoring the "review" data and only going by the prior
#This makes sense because there is not much information from the observed count for such a small county.
#Next let us look at the county in row 1624. This is Arthur County again Nebraska
#It has a small population of 1233 and death count of 0.
print(d.iloc[1624])
x = np.arange(0, 8e-4, 1e-6)
Tot = round(d.loc[1624, 'popm'])
Pos = d.loc[1624, 'dct']
Neg = Tot - Pos
plt.plot(x, beta.pdf(x, a, b), color = 'red', label = 'Prior')
plt.plot(x, beta.pdf(x, a + Pos, b + Neg), label = 'Posterior')
plt.axvline(x = Pos/Tot, color = 'blue') #line at the naive proportion
plt.legend()
plt.show()
#Similar story here; not much difference between prior and posterior
state NEBRASKA Location Arthur County, Nebraska fips 31005 dc 0 pop 1295 dc.2 0 pop.2 1172 dct 0 popm 1233.5 naiveproportion 0.0 cancerhigh False modfips 31005 bayesestimate 0.000097 bayeshigh False Name: 1624, dtype: object
print(d.iloc[346])
#This is a large county (population around half a million) and a large death count (120)
x = np.arange(0, 8e-4, 1e-6)
Tot = round(d.loc[346, 'popm'])
Pos = d.loc[346, 'dct']
Neg = Tot - Pos
plt.plot(x, beta.pdf(x, a, b), color = 'red', label = 'Prior')
plt.plot(x, beta.pdf(x, a + Pos, b + Neg), label = 'Posterior')
plt.axvline(x = Pos/Tot, color = 'blue') #line at the naive proportion
plt.legend()
plt.show()
#The Bayesian posterior moves quite a bit in the direction of the frequentist estimate
#The frequentist analysis does not flag this as a high-risk county because there were many low population counties
#which dominated its proportion.
state FLORIDA Location Sarasota County, Florida fips 12115 dc 55 pop 485595 dc.2 65 pop.2 573711 dct 120 popm 529653.0 naiveproportion 0.000227 cancerhigh False modfips 12115 bayesestimate 0.000203 bayeshigh True Name: 346, dtype: object
Bayesian estimates are often said to perform "Shrinkage". They shrink the frequentist estimates towards the prior. Overall, the variability of Bayesian estimates will be quite a bit smaller compared to frequentist estimates.
h1, _, _ = plt.hist(d['naiveproportion'], bins=100, color='blue', label='Naive Estimate', alpha=0.5)
h2, _, _ = plt.hist(d['bayesestimate'], bins=100, color='red', label='Bayes Estimate', alpha=0.5)
plt.title("Histograms of the Naive and Bayesian estimates")
plt.xlabel("Estimates")
plt.legend()
plt.show()
This shrinkage is one reason why the Bayes method was able to flag Sarasota County, Florida as a high risk county even though its estimate was only 0.000203; this was already extreme compared to the rest of the Bayesian estimates.
Recall that the original dataset had two time periods 80-84 and 85-89. We now repeat the analysis only for the first time period and attempt to predict the proportions for the second time period. We can compare the performances of the naive proportions and the Bayes estimates in this prediction problem.
d['naiveproportion1'] = d['dc'] / d['pop'] #these are the naive estimates
#For the Bayes estimates, we first need to estimate the prior:
proportions1_largecounties = d['naiveproportion1'][d['pop'] >= 300000] #filter out the high population counties
m1 = np.mean(proportions1_largecounties)
V1 = np.var(proportions1_largecounties)
a1 = ((m1*m1*(1-m1))/V1) - m1
b1 = (((1-m1)*(1-m1)*m1)/V1) - (1-m1)
d['bayesestimate1'] = (d['dc'] + a1)/(d['pop'] + a1 + b1)
#Proportions from the second half of the data:
d['proportion2'] = d['dc.2'] / d['pop.2']
#Prediction Accuracy:
sum_diff_naiveproportion1 = d['naiveproportion1'] - d['proportion2']
sum_abs_diff_naiveproportion1 = sum_diff_naiveproportion1.abs().sum()
sum_diff_bayesestimate1 = d['bayesestimate1'] - d['proportion2']
sum_abs_diff_bayesestimate1 = sum_diff_bayesestimate1.abs().sum()
bothresults = [sum_abs_diff_naiveproportion1, sum_abs_diff_bayesestimate1]
print(bothresults)
[0.12508482755988196, 0.10530340006718594]
There is thus an improvement (slight) in the prediction accuracy using Bayes estimates as opposed to the naive estimates, in terms of the sum of absolute deviations (note that other comparison metrics might show different results).