Epi 101: Direct Age Adjustment by Hand and With R

When you’re comparing two populations’ death rates, you need to take into account the differences in the age distributions of the two populations. For example, if one population has more young people in it, you might not see as many deaths from heart conditions as you would see in a population with more adults over the age of 50. It’s not that the two are really different in the rates of death from heart conditions, it’s just that the population differences are accounting for this.

For Example

Let’s say you are comparing deaths from cancer in two populations, A and B, and you know that the particular cancer you’re looking at hits a lot of people over the age of 75, but only a few under the age of 15. The other age groups are affected somewhere in the middle. Let’s look at the data:

Age Group	Population A	Number of Deaths
0-15	10,000	100
15-35	12,000	600
36-55	15,000	1,050
56-75	25,000	2,500
Over 75	30,000	4,500
Total	92,000	8,750
	Crude Death Rate:	8,750/92,000 = 95 per 1,000

Age Group	Population B	Number of Deaths
0-15	25,000	250
15-35	15,000	750
36-55	20,000	1,400
56-75	12,000	1,200
Over 75	11,000	1,650
Total	83,000	5,250
	Crude Death Rate:	5,250/83,000 = 63 per 1,000

So population A has a death rate of 95 deaths per 1,000 residents. Population B has a death rate of 63 deaths per 1,000 residents. Clearly, if you want to stay away from dying from this kind of cancer, you need to go to population B, right? Well…

Look Closer

If you pay close attention, you will notice that the population distributions between the two are different. Population A has most of its residents in the 65-75 and Over 75 age groups. Population B, on the other hand, has most of its residents in the 0-15 and 36-55 age groups. That is, population B is a bit younger. So could it be that they have a lower crude death rate because they’re younger?

Let’s Find Out!

To compare the two populations, we need to use a standard population. A quick way to get a standard population is to use either population A or population B as a standard. Another way is to use the United States’ population distribution as a standard. Whatever you choose, you have to then compare the two populations’ age-specific death rates to that standard. For this example, I created a standard population:

Age Group	Population – Standard
0-15	20,000
16-35	30,000
36-55	35,000
56-75	37,000
Over 75	28,000
Total	150,000

To this standard population, we are going to apply the age-specific death rates from each population, and use that to calculate the expected number of deaths in the standard population if the age-specific death rates for population A (and then B) are applied to the standard population.

Age Group	Population Standard	Age-specific death rate Population A	Expected Deaths Population A
0-15	20,000	10 per 1,000 (0.01)	200
16-35	30,000	50 per 1,000 (0.05)	1,500
36-55	35,000	70 per 1,000 (0.07)	2,450
56-75	37,000	100 per 1,000 (0.10)	3,700
Over 75	28,000	150 per 1,000 (0.15)	4,200
Total	150,000	Total Expected Deaths	12,050

Now, let’s do it for population B (and see if you notice a pattern):

Age Group	Population Standard	Age-specific death rate Population B	Expected Deaths Population B
0-15	20,000	10 per 1,000 (0.01)	200
16-35	30,000	50 per 1,000 (0.05)	1,500
36-55	35,000	70 per 1,000 (0.07)	2,450
56-75	37,000	100 per 1,000 (0.10)	3,700
Over 75	28,000	150 per 1,000 (0.15)	4,200
Total	150,000	Total Expected Deaths	12,050

Do you see it? The age-specific death rates for the two populations are exactly the same. You just got different totals in the crude rates because of the different age distributions. Once we applied the age-specific death rates to the standard population (aka “reference population”), we get the same number of expected deaths: 12,050.

With that, we get the same adjusted death rates for the two populations: 12,050 divided by 150,000, which is 0.080 or 80 per 1,000 residents. From this, you can conclude that the observed differences in the crude deaths rates is more a function of different age distributions between the populations than from actual differences in death rates between the two populations.

What About R?

To do this in R, we are going to need three packages: “dsr”, “epitools” and “knitr”. So pop open a new R script in R Studio and type:

library(knitr)
library(dsr)
library(epitools)

Next, we are going to create our data:

age_groups <- c("0-15", "16-35", "36-55", "56-75", ">75") # Labels for Age Groups
population_a <- c(10000, 12000, 15000, 25000, 30000) # Population A
population_b <- c(25000, 15000, 20000, 12000, 11000) # Population B
deaths_a <- c(100, 600, 1050, 2500, 4500) # Deaths in Population A
deaths_b <- c(250, 750, 1400, 1200, 1650) # Deaths in Population B
std_pop <- c(20000,30000,35000,37000, 28000) # Population of Standard Population

Note that we basically created six vectors to represent the data in the tables above. From these vectors, we are going to make some data frames.

df_a <- data.frame(age=age_groups, # Data frame for Population A
                   death=deaths_a,
                   fu=population_a, # fu (follow-up) can be person-time or population counts
                   population='A')

df_b <- data.frame(age=age_groups, # Data frame for Population B
                   death=deaths_b,
                   fu=population_b, # fu (follow-up) can be person-time or population counts
                   population='B')

df_all <- rbind(df_a, df_b) # Data frame combining both populations

df_pop <- data.frame(age=age_groups, # Data frame for the Standard/Reference Population
                     pop=std_pop)

With our main data frame “df_all” we can now use “dsr” to do the hard work for us, and we are going to store the results in “my_results” so we can look at them later.

my_results <- dsr(data=df_all, # What data frame are you using for your data
                  event=death, # The count of the events, in this case, deaths
                  fu=fu, # The follow-up time or, in this case, population counts
                  subgroup=population, # The subgroups "population" and "age"
                  age,                 # "population" meaning A and B, and "age" meaning the age groups
                  refdata=df_pop, # The reference data frame
                  method="gamma", # The method for the confidence interval
                  sig=0.95, # The level of confidence for the 95% confidence interval
                  mp=1000, # The rate multiplier is "per 1,000" in this case
                  decimals=0) # The number of decimals for your results

Now that we have the results in “my_results”, let’s look at them in a table:

knitr::kable(my_results)

Subgroup	Numerator	Denominator	Crude Rate (per 1000)	95% LCL (Crude)	95% UCL (Crude)	Std Rate (per 1000)	95% LCL (Crude)	95% UCL (Crude)
A	8750	92000	95	93	97	80	79	82
B	5250	83000	63	62	65	80	79	83

Voila! You have the same results as you did doing it by hand, above. Congrats.

Of course, you can also use “epitools” to calculate these adjusted rates:

pop_a_adjust <- ageadjust.direct(deaths_a, # The vector of values for deaths for Pop A
                                 population_a, # The population of Pop A
                                 rate = NULL, # If you had a standard rate, you'd put it here
                                 std_pop, # Your standard/reference population
                                 conf.level = 0.95) # The confidence level for the 95% confidence interval
round(10^3*pop_a_adjust,0) # Look at the results

pop_b_adjust <- ageadjust.direct(deaths_b,
                                 population_b,
                                 rate = NULL,
                                 std_pop,
                                 conf.level = 0.95)
round(10^3*pop_b_adjust,0)

From this, you don’t get a fancy table (though you could make it one with some extra coding. You get an output, for Population A:

crude.rate   adj.rate        lci        uci 
        95         80         79         82

And for Population B:

crude.rate   adj.rate        lci        uci 
        63         80         78         83

And there you have it. Two ways to do this in R. You can now save the R file (or download the file I used by clicking here). If you have a lot of data, and you don’t want to write out the individual vectors at the beginning, then I suggest you learn the basics of importing data into R and then use that with the packages mentioned above.

Big thanks to these folks and these folks for their examples on how to use dsr and epitools, respectively.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Category: Blog Tags: #biostatistics, #epidemiology, Age Adjustment, Epi 101, Epi Night School

1 Comments on “Epi 101: Direct Age Adjustment by Hand and With R”

Pingback: Epi 101: Indirect Age Adjustment by Hand and in R – EpidemioLogical

EpidemioLogical

Life & Epidemics & Stuff

Epi 101: Direct Age Adjustment by Hand and With R

For Example

Look Closer

Let’s Find Out!

What About R?

1 Comments on “Epi 101: Direct Age Adjustment by Hand and With R”

Traduce Esto | 이것을 번역하라 | Übersetze das | Traduis cela | 翻译这个

The Last Five Posts

Don't Miss the Next Post!

Latest Instagram (@EpiRen)

Follow Blog via Email

RSS and Stuff

EpidemioLogical

Life & Epidemics & Stuff

Epi 101: Direct Age Adjustment by Hand and With R

For Example

Look Closer

Let’s Find Out!

What About R?

Share this:

Related

1 Comments on “Epi 101: Direct Age Adjustment by Hand and With R”

Traduce Esto | 이것을 번역하라 | Übersetze das | Traduis cela | 翻译这个

The Last Five Posts

Don't Miss the Next Post!

Latest Instagram (@EpiRen)

Follow Blog via Email

RSS and Stuff