3  Results

3.1 Key Takeaways

We divided our key findings into three separate parts, one for each dataset inspected. First, to explore demographic trends we investigated the database of opioid overdoses. Then, we compared that information with that within the healthcare spending dataset. Finally, we compared the overdose statistics to the unemployment information.

Below is a line plot which graphs the rate of overdoses by drug type for the 19 year span between 1999 and 2017. The y-axis shows the number of people who passed away for a given population of 100,000 individuals. By investigating a rate instead of a total, we can ignore the effect of the growing population.

Code
library(ggplot2)
library(tidyr)
library(dplyr, warn.conflicts = FALSE)
options(dplyr.summarise.inform = FALSE)
options(warn=-1)

drug <- read.csv('~/Downloads/drug_overdose.csv')
drug_total <- drug[ which(drug$STUB_NAME=='Total'& drug$UNIT == 'Deaths per 100,000 resident population, age-adjusted' & drug$PANEL_NUM != 0 & drug$PANEL_NUM != 1), ]
drug_total <- drug_total[c("PANEL_NUM", "ESTIMATE", "YEAR")]

drug_total <- drug_total %>%
  mutate(PANELS= case_when(
    PANEL_NUM==2 ~ "Natural or Semisynthetic",
    PANEL_NUM==3 ~ "Methadone",
    PANEL_NUM==4 ~ "Other Synthetic",
    PANEL_NUM==5 ~ "Heroin"
    ))
ggplot(drug_total, aes(x=YEAR, y=ESTIMATE, group=factor(PANELS), color=factor(PANELS))) +
  geom_line() +
  coord_fixed() +
  ggtitle("Number of Overdoses per 100,000 People") +
  xlab("Year") +
  ylab("Overdose Rate") +
  labs(color = "Drug Type") +
  scale_x_continuous(breaks = seq(1999, 2017, by=1)) +
  scale_y_continuous(n.breaks = 10) +
  theme(axis.text.x = element_text(angle=45, hjust=1,size=6),
        axis.text.y = element_text(angle=0, hjust=1,size=8),
        panel.background = element_rect(fill = 'white', color = 'gray80', size=0.25),
        panel.grid.major = element_line(color = 'gray80', size = 0.25),
        panel.grid.minor = element_line(color = 'gray95', size = 0.1),
        legend.position = c(0.2, 0.75), 
        legend.background = element_rect(fill = "white", colour = NA),
        legend.key.size = unit(0.5, 'cm'),
        legend.title = element_text(size=8), 
        legend.text = element_text(size=8))   

One of the research questions we were interested in solving was whether or not the CDC was correct about the three distinct spikes. They claimed that there was a dramatic rise in heroin use in 2010, and another such rise in synthetic opioids in 2013. Unfortunately, our dataset does not go back to the 1990’s so we cannot see about the rise of prescription drug abuse. However, there are clearly dramatic inclines for heroin (red line) and synthetic opioids (cyan line) on the specified years. So we can conclusively say that the CDC was valid with their assertion.

3.3 National Health Expenditures

In this section, we will look at graphs that display trends between drug fatalities and the cost of healthcare in the United States. The reason we chose this specific topic was by asking ourselves what external factors may cause individuals to abuse drugs. We hypothesized that people would self-medicate using narcotics more if the price of healthcare was too expensive. Below are two line plots, one which displays the average cost of healthcare per person, and the other which shows the average cost of prescription drugs per person. This information was calculated by extracting healthcare spending data from the NHE database and dividing by the US population for each year. The result shows how much money was spent, on average, by each person in the country.

Code
library(tibble)
suppressPackageStartupMessages(library(gridExtra))

nhe <- read.csv('~/Downloads/NHE2021.csv')
pop1 = read.csv("~/Downloads/POPTOTUSA647NWDB.csv")
pop <- pop1
library(data.table, warn.conflicts = FALSE)
nhe <- transpose(nhe)
nhe_new <- nhe
colnames(nhe_new) <- nhe[1, ]
nhe <- nhe_new
nhe <- nhe[-1,]
rownames(nhe) <- c(1960:2021)
nhe$Year <- rownames(nhe)
nhe <- subset(nhe, Year > 1998 & Year < 2018)
nhe <- nhe %>% select(c(2, 4, 5, 6, 282, 284, 285, 286))
colnames(nhe) <- c('Out of pocket', 'Private Health Insurance', 'Medicare', 'Medicaid', 'Out of pocket-Drug', 'Private Health Insurance-Drug',   'Medicare-Drug', 'Medicaid-Drug' )

nhe <- tibble::rownames_to_column(nhe, "Year")

pop <- pop %>%
  mutate(across(everything(), as.character))


nhe <- bind_cols(nhe, pop['POPTOTUSA647NWDB'])

nhe$POPTOTUSA647NWDB <- as.numeric(nhe$POPTOTUSA647NWDB)
nhe$`Out of pocket`             <-  1000000*as.numeric(gsub(",", "", nhe$`Out of pocket`))/nhe$POPTOTUSA647NWDB
nhe$`Private Health Insurance`  <-  1000000*as.numeric(gsub(",", "", nhe$`Private Health Insurance`))/nhe$POPTOTUSA647NWDB
nhe$Medicare                    <- 1000000*as.numeric(gsub(",", "", nhe$Medicare))/nhe$POPTOTUSA647NWDB
nhe$Medicaid                    <- 1000000*as.numeric(gsub(",", "", nhe$Medicaid))/nhe$POPTOTUSA647NWDB
nhe$`Out of pocket-Drug`              <-  1000000*as.numeric(gsub(",", "", nhe$`Out of pocket-Drug`))/nhe$POPTOTUSA647NWDB
nhe$`Private Health Insurance-Drug`   <-  1000000*as.numeric(gsub(",", "", nhe$`Private Health Insurance-Drug`))/nhe$POPTOTUSA647NWDB
nhe$`Medicare-Drug`                    <- 1000000*as.numeric(gsub(",", "", nhe$`Medicare-Drug`))/nhe$POPTOTUSA647NWDB
nhe$`Medicaid-Drug`                    <- 1000000*as.numeric(gsub(",", "", nhe$`Medicaid-Drug`))/nhe$POPTOTUSA647NWDB

nhe$POPTOTUSA647NWDB <- NULL


 nhe_long <- nhe %>% 
   pivot_longer(!Year, names_to="Category", values_to="Amount")
 nhe_long$Amount <- as.integer(nhe_long$Amount)
 
  nhe_long$Type <- with(nhe_long, ifelse(endsWith(Category, "Drug"), "Drug", "General"))




xs <- split(nhe_long,f = nhe_long$Type)


p1 <- ggplot(xs$General, aes(x=Year, y=Amount, group=factor(Category))) +
         geom_line(aes(color=Category)) +
        geom_vline(xintercept = "2013", color="red3") +
        geom_vline(xintercept = "2010", color="red4") +
        facet_wrap(~Type, scales="free") +
        theme(panel.background = element_rect(fill = 'white', color = 'gray80', size=0.25),
              panel.grid.major = element_line(color = 'gray80', size = 0.25),
              panel.grid.minor = element_line(color = 'gray95', size = 0.1),axis.text.x = element_text(angle=45, hjust=1,size=6),
              axis.text.y = element_text(angle=0, hjust=1,size=6),
              legend.position = c(0.225, 0.9), 
              legend.background = element_rect(fill = "white", colour = NA),
              legend.key.size = unit(.05, 'cm'),
              legend.title = element_text(size=6), 
              legend.text = element_text(size=6))


p2 <- p1 %+% xs$Drug  +
        scale_y_continuous(breaks = seq(0,500, by=25)) +
        annotate("text", x=10, y=270, label="2010\n heroin\n spike", size=3, angle=0, color="red4") +
        annotate("text", x=17, y=315, label="2013\n synthetics\n spike", size=3, angle=0, color="red3") +
        labs(title= "Pharmaceutical Costs",
        y = "Pharmaceutical Costs per Person",
        x= "Year")

p1 <- p1 %+% 
        scale_y_continuous(breaks = seq(600, 3400, by=200)) +
        annotate("text", x=10, y=2200, label="2010\n heroin\n spike", size=3, angle=0, color="red4") +
        annotate("text", x=17, 2400, label="2013\n synthetics\n spike", size=3, angle=0, color="red3") +
        labs(title= "Total Healthcare Costs",
        y = "Healthcare Costs per Person",
        x= "Year")

grid.arrange(p1,p2, nrow=1)

We have included vertical lines to mark the rise in heroin and sythetics in the years 2010 and 2013, respectively. One pattern that jumped out at us was in the year 2013, when the cost of private health insurance skyrocketed, both in general and in pharmaceutical payments. The NHE defines this category as “the net cost of private health insurance which is the difference between health premiums earned and benefits incurred”. Interestingly, the boom in health insurance prices in 2013 reflects a similar rise in synthetic opioid abuse. It is our contention that this inflated cost of care caused people to seek alternate solutions to medicate themselves, namely narcotics. However, correlation does not prove causation and these results may simply be coincidental. The other three categories (Medicare, Medicaid, and out of pocket expenses) do not provide us with any meaningful takeaways on the financial toll incurred by the population of this country.

Below we created two scatterplots to see if there is any discernible correlation between the cost of healthcare/pharmaceuticals and the overdose rate for opioids.

Code
drug_all <- drug[ which(drug$STUB_NAME=='Total'& drug$UNIT == 'Deaths per 100,000 resident population, age-adjusted' & drug$PANEL_NUM == 1), ]
drug_all <- filter(drug_all, YEAR != "2018")

expenditures_drug <- filter(nhe_long, Category == "Out of pocket-Drug" | Category == "Private Health Insurance-Drug")
expenditures_drug <- expenditures_drug %>% group_by(Year) %>%
                                 summarize(drug_spending = sum(Amount),
                                 .groups = 'drop') 

expenditures_all <- filter(nhe_long, Category == "Out of pocket" | Category == "Private Health Insurance")
expenditures_all <- expenditures_all %>% group_by(Year) %>%
                                 summarize(healthcare_spending = sum(Amount),
                                 .groups = 'drop')

combined <- bind_cols(drug_all['ESTIMATE'], drug_all['YEAR'], expenditures_all['healthcare_spending'], expenditures_drug['drug_spending'], pop1['POPTOTUSA647NWDB'])


p1 <- ggplot(combined, aes(x= healthcare_spending, y= ESTIMATE, label=YEAR))+
  geom_point(size=0.5, color="red4") +
  geom_text(size=2, hjust=0.5, vjust=1.5) +
  scale_x_continuous(n.breaks = 15, breaks=waiver()) + 
  scale_y_continuous(breaks = seq(0,15,1)) +
  theme(panel.background = element_rect(fill = 'white', color = 'gray80', size=0.25),
          panel.grid.major = element_line(color = 'gray80', size = 0.25),
          panel.grid.minor = element_line(color = 'gray95', size = 0.1),
          axis.text.x = element_text(angle=45, hjust=1,size=6),
          axis.text.y = element_text(angle=0, hjust=1,size=6)) +
  labs(title= "Opioid Overdoses \n vs. Cost of Healthcare",
       y= "Total Overdoses per 100,000 People",
       x= "Average Healthcare Costs \n(Private Insurance + Out of Pocket)\n Per Person")
  

p2 <- ggplot(combined, aes(x= drug_spending, y= ESTIMATE, label=YEAR))+
  geom_point(size=0.5, color="red4") +
  geom_text(size=2, hjust=0.5, vjust=1.5) +
  scale_x_continuous(n.breaks = 15, breaks=waiver()) + 
  scale_y_continuous(breaks = seq(0,15,1)) +
  theme(panel.background = element_rect(fill = 'white', color = 'gray80', size=0.25),
          panel.grid.major = element_line(color = 'gray80', size = 0.25),
          panel.grid.minor = element_line(color = 'gray95', size = 0.1),
          axis.text.x = element_text(angle=45, hjust=1,size=6),
          axis.text.y = element_text(angle=0, hjust=1,size=6)) +
  labs(title= "Opioid Overdoses \n vs. Cost of Pharmaceuticals",
       y = "",
       x= "Average Pharmaceutical Costs \n(Private Insurance + Out of Pocket)\n Per Person")

grid.arrange(p1,p2, nrow=1)

Code
corr1 <- cor(combined$ESTIMATE, combined$healthcare_spending)
print(paste("Correlation coefficient for healthcare spending and overdoses: ", corr1))
[1] "Correlation coefficient for healthcare spending and overdoses:  0.914776421295715"
Code
corr2 <- cor(combined$ESTIMATE, combined$drug_spending)
print(paste("Correlation coefficient for pharmaceutical spending and overdoses: ", corr2))
[1] "Correlation coefficient for pharmaceutical spending and overdoses:  0.692457900045326"

Each point represents a year in the time frame between 1999 and 2017. On the x-axis is the average cost of healthcare/prescription drugs per person. This value is the sum of out-of-pocket costs and health insurance costs. The y-axis is the rate of opioid-related fatalities per 100,000 individuals. Using the correlation function in R, we were able to determine that there is a 91% correlation between healthcare spending and fatality rates. We also determined that there is a 69% correlation between pharmaceutical spending and fatality rates. These trends may be coincidental, or they could point to a systematic issue within the United States healthcare system that neglects the less fortunate.

3.4 National Unemployment Rate

Last, we were interested in seeing if the national unemployment rate had any correlation with the rate of overdoses. According to research conducted by the Suicide Prevention Resource Center, “People who were unemployed were more than 16 times as likely to die by suicide as people with jobs” [1]. This being the case, we believed that there would be a positive correlation between the national unemployment rate and the overdose rate. Below is a scatterplot that compares the two rates, where each point represents a year from 1999 to 2017. This plot is interactive, so you can hover over a point to see which year it represents.

Code
drug_year <- drug %>% drop_na()
drug_year <- subset(drug_year, UNIT=='Deaths per 100,000 resident population, age-adjusted' & STUB_NAME == 'Total' & PANEL == 'Drug overdose deaths involving any opioid' & YEAR <= 2017)

unemployment <- read_csv("~/Downloads/unemployment.csv", show_col_types = FALSE)
annual <- rowMeans(unemployment[, 2:13], na.rm=T)
unemployment$Annual <- annual

drug_year <- cbind(drug_year, unemployment$Annual)
colnames(drug_year)[16] ="unemployment"
drug_year$unemployment <- round(drug_year$unemployment ,digit=3)
Code
suppressPackageStartupMessages(library(plotly, warn.conflicts = FALSE))
suppressPlotlyMessage <- function(p) {
  suppressMessages(plotly_build(p))
}



fig <- plot_ly(drug_year, x = ~ESTIMATE, y = ~unemployment, marker = list(size = 10,
                            color = 'rgba(255, 182, 193, .9)',
                            line = list(color = 'rgba(152, 0, 0, .8)',width = 2)), 
                            text = ~paste("Year: ", YEAR, '<br>Unemployment:', 
                            unemployment, '<br>Overdose Death Rate:', ESTIMATE)) %>%
  layout(title = 'Unemployment vs Overdose Death Rate', 
         xaxis = list(title = 'Average Overdose Death Rate'), 
         yaxis = list(title = 'Average Unemployment Rate'))

suppressPlotlyMessage(fig)
Code
corr3 <- cor(drug_year$ESTIMATE, drug_year$unemployment)
print(paste("Correlation coefficient for unemployment and overdoses: ", corr3))
[1] "Correlation coefficient for unemployment and overdoses:  0.0496637809558611"

Initial inspection shows that there doesn’t seem to be any discernible correlation between the unemployment rate and and the overdose rate. Using R, we have calculated that these two sets of data are 5% correlated (r=0.0497), meaning that an increase in unemployment doesn’t necessarily correspond to an increase in overdoses. While this may be the case on a national scale, it is important that we heed the warning of the Suicide Prevention Resource Center and offer support to our friends and families who are unemployed.

[1] https://sprc.org/news/unemployment-and-suicide/