INTRODUCTION

The Youtube link to our comprehensive presentation can be found here

The opioid crisis in Mesa, Arizona is a public health emergency that must be addressed. A spike in overdoses onset by the COVID-19 pandemic in early 2020 saw a 76% increase in the number of opioid overdose incidents and a 132% increase in opioid related deaths1. The problem is not a new one, however. Due to an alarming increase in opioid deaths in 2016, Governor Ducey declared a state of emergency on June 5, 2017 and the Mesa County Opioid Response Group was established in June 2018 to reduce the community impact of opioids through prevention, treatment, and recovery services. The goal of the group is to prevent substance misuse through:

  • Education of community members & health providers
  • Improving treatment access & retention
  • Reduce harm by providing educational information and training on Naloxone and how to obtain and use the life-saving overdose reversal drug.

The state of emergency was officially terminated in May, 2018, despite the growing numbers. Figure 1, sourced from the data.mesaaz.gov, illustrates the urgency of the increasing overdose incident count, reaching a record high of 211 in September of this year.

View the data figure 1. interactive chart via data.mesaaz.gov


We believe in the power of predictive modeling to assist in the critical decision making process around the allocation of resources towards the opioid crisis. The goal of this analysis is to estimate a geospatial risk prediction model, predicting overdoses as a function of environmental factors like crime, proximity to city facilities & services, and demographic indicators. The process includes wrangling data from Mesa’s Open Data portal, and engineering them into powerful predictive features. We use a Poisson Regression, cross validated with K-fold and Leave One Group Out (LOGO) cross validation methods. We are essentially borrowing the experience of observed opioid overdose incidents of 2018, and will compare our predictions to observed opioid overdose incidents from 2019. Predictions from this models are interpreted as ‘the forecasted risk/opportunity of that event occurring here’. A series of ‘goodness of fit’ indicators will tell us how well our model is performing, and if our predictions will generalize well across time and space.

Opioid Observation Treatment Center

Our end goal with the geospatial risk model is to produce a tool that can be used by Mesa health officials to implement harm reduction strategies through proactive resource allocation. We propose an updated response program that will create facilities throughout Mesa called Opioid Observation and Treatment Centers (OOTC). An OOTC will serve two purposes:

  • Be a safe space that provides support, safe - monitoring/safe injection site, overdose treatment, and primary care for individuals who are using and would otherwise be on the street.

  • Have staff and resources prepared for emergency response to overdose incidents – cater to public and residential overdoses.

Web Application

Through this analysis, we hope to identify locations throughout Mesa that will benefit most from the creation of Opioid Observation and Treatment Centers, and provide recommendations for future intervention. We have conceptualized a web application that will give access to the automatically updating output of our machine learning model. City and healthcare officials can utilize our analysis output to properly allocate staff and resources towards creating these centers with the goal of mitigating opioid overdose death.


EXPLORATORY ANALYSIS

Wrangling Data

Public data sets used for this analysis are as follows:
- Fire and Medical Opioid Overdose Incidents Confirmed cases of opioid overdose, locations are not actual but instead rounded to approximately 1/3 mile increments. Opioid overdose confirmed by 1) patient or witness verification, 2) Opioid found on scene or 3) positive response to Narcan treatment.
- City Property Data will be filtered to account for various facility type - child crisis center, fire and police station, public housing, arts and education centers.
- Crime Data Incidents based on initial police reports taken by officers when responding to calls for service. Data is modified for public use. Address and Location are not exact locations of incidents and have been rounded to nearest hundred block. Lat/Long are approximations only based on rounded hundred block. Incidents reported in this dataset may not correlate with 911 Events datasets and calls for Police service. The City of Mesa does not disclose information that is inflammatory in nature that impacts our citizens. Split by type; misdemeanors and felonies.
- Park Locations And Amenities Information about location and amenities available at developed city parks, park open spaces (basins) maintained by the City of Mesa. Pools, Recreation Centers and other public gathering facilities owned and/or maintained by the City of Mesa’s Parks Department such as convention center, amphitheater and cemetery are also listed.
- Light Rail Stations This dataset provides information regarding the location of current light rail stations within the City of Mesa and each station’s name and address.
- City Boundary Map City of Mesa jurisdiction.
- Zoning Districs Specifically delineated geographic areas in the city within which regulations and requirements uniformly govern the use of land - separated between High density residential, low density residential, downtown, commercial, and industrial.
- Census ACS data demographic and socio-economic indicators at the tract level.




figure 2. illustrates the relationship of zoning districts to the opioid overdose incident points. Because the overdose incident points are rounded to the approximate 1/3 mi. interval for privacy reasons, we have aggregated the amount of overdoses at each point and visualized each point as a graduated symbol representing the count number. This is how overdose incident points will be illustrated throughout the analysis, with the exception of othe fishnet.

The Fishnet

A municipal or government boundary, be it a Census block group, or a police district, is often drawn arbitrarily or does not represent a consistent division of space. We are hoping to analyze the risk of opioid overdose evenly across space and cannot be limited by these boundaries. For this reason, we create a ‘fishnet’, or grid of cells at 1/3 mi dimensions, to which we will aggregate our point data. The risk predictions will be represented as a smooth surface. Figure 3 illustrates opioid overdose incidents from 2018, where darker pixels indicate a higher count of incidents.

figures 4.1 - 4.4 follow the same process, but with our independent variables wrangled from our data sets.







Nearest Neighbor

Figures 4.5 and 4.6 are a representation of the nearest neighbor function used on our City property and census data. The nearest neighbor function calculates the average distance of each fishnet cell to the closest 3 points of the variable. this creates a gradiated map because the value of each fishnet cell takes on the calculated average distance. We will determine whether we will use the unaggregated or nearest neighbor version of these variables for the final poisson regression through analyzing the correlation of each independent variable to the dependent variable later in the analysis. Our next step is to explore the spatial process of opioid overdoses in Mesa.



Local Moran’s I

To test the spatial process, we us a statistic called Local Moran’s I. Here, the null hypothesis is that the overdose count at a given location is randomly distributed relative to its immediate neighbors. We create a neighbor list and a spatial weights index using queen contiguity. Our output, figure 5.1, depicts where there is relatively high local clustering of overdose incidents. We see this clustering also represented in the spatial process of the p-value, where lighter pixel colors represent a lower p-value. If our null hypothesis states that there is no significant local clustering, a p-value <= 0.05 indicates the areas in which we can reject the null hypothesis in favor of the alternative hypothesis, that there is in fact local clustering. The significant hot spots are created from the fishnet pixels that have a p-value <= 0.05.

Figure 5.2 illustrates the distance function used on our significant hot spot output, thus creating a value for each fishnet cell that represents the distance between the centroid of the cell to the nearest significant cluster. This value will be used as a predictive feature in our poisson regression model.


Variable Correlation

For the final step of our exploratory analysis, we plot the Pearson correlation of each independent variable and our dependent variable. If the Pearson coefficient (the r value displayed in the top left corner of each individual scatter plot) is close to 1, we have a strong positive correlation. Similarly, if the r value is close to -1, we have a strong negative correlation. The closer the r value is to 0, the weaker the correlation between our variables. We can see both felonies and misdemeanors have strong correlations with overdose incidents even after we’ve filtered to exclude drug related crimes. This step helps us to determine which variables should be used in our regression model. For example, we see Low.Income has a positive correlation of 0.45, while the nearest neighbor version of this variable, low_inc.nn has a negative correlation of -0.29. We choose Low.Income to use in our model because change in demographic indicators such as income levels or race is not consistent with distance. Proximity does not account for “the other side of the tracks” phenomenon where social and economic status is contained within a geographic boundary. And we know low income neighborhoods often face more drug related issues, so we expect a positive correlation. Another note here, is we will exclude the Downtown variable because it was causing extreme errors in our regression.




POISSON REGRESSION

The histogram depicting the distribution of overdose occurrences in 2018 in figure 7 tell us how heavily skewed towards 0 our data is. This is understandable – as the vast majority of our fishnet cells will have no overdose occurrences within them. The Poisson distribution most similarly resembles our overdose distribution, therefore we will be using a Poisson regression.




The variables we choose to use in our regression are as follows:

Commercial, High.Density.Residential, Low.Density.Residential,Park.nn, Arts.and.Education.nn, Police.Fire.nn, Public.Housing, Vacant.Property, Low.Income, Low.Rent, Majority.White, Majority.Hispanic,Child.Crisis.Center, Industrial, misdemeanor, felony,lightrail, overdose.isSig.dist, overdose.isSig


Cross Validation

While an accurate model is very important, we are more concerned with generalizability for the geospatial risk model. This will tell us how well our model performs on seeing new data and its ability to recognize spatial group contexts, like demographic and racial differences across neighborhoods. Our methods for cross-validating our data include k-fold cross validation(CV) and leave one group out cross validation (LOGO-CV). K-fold takes a group of our fishnet cells, trains our model on the remaining cells, tests the prediction on said group, records a goodness of fit metric, and moves onto the next group of cells until every cell has been trained and tested on. LOGO-CV has a similar process, except we are using the Census Tract GEOID as the geographic area we exclude and test on. LOGO-CV should ideally use neighborhood geographic boundaries because Census Tracts are often arbitrary, but we were limited by the data available to us. We are also looking to see how each cross validation method performs with and without spatial context by including our Moran’s I significance indicator. Therefore, we are running four iterations of our cross validation:

1. K-fold CV - Just risk factors
2. K-fold CV - Spatial Process
3. LOGO-CV - Just risk factors
4. LOGO-CV - Spatial Process

The hope here is to confirm that the inclusion of the spatial process is improving our model. Figures 8 and 9 visualize the mean absolute errors (MAE) made by our model. The histograms in figure 8 are showing us the distribution of the MAE for each fold made by each regression iteration. We are pleased to see the distribution of mean average errors for each fold tighten towards 0 when the spatial process is included. And as we can see in the MAE table for each regression below, our average error, while marginal, decreases for our LOGO-CV when the spatial process is accounted for, confirming our expectation.




Regression Mean_MAE SD_MAE
Random k-fold CV: Just Risk Factors 0.14 0.11
Random k-fold CV: Spatial Process 0.13 0.10
Spatial LOGO-CV: Just Risk Factors 0.28 0.37
Spatial LOGO-CV: Spatial Process 0.27 0.33

LOGO-CV will have a higher mean average error because it is a more conservative prediction of our dependent variable, but again we are more concerned with the generalizability than we are the accuracy of our model. The table below outputs the Global Moran’s I and the p-value for each iteration of our LOGO-CV model.

Regression Morans_I p_value
Spatial LOGO-CV: Just Risk Factors 0.2790521 0.001
Spatial LOGO-CV: Spatial Process 0.3829658 0.001


The maps below in figure 11 plot our predictions for the LOGO-CV. We can see that the spatial process iteration of our LOGO-CV has slightly improved in picking up the hotspots. The two maps on the left indicate ‘latent overdose risk’ while the ‘observed’ map on the right allows us to visually compare our predictions to overdose events that have already occurred. The charts in figure 12 tell us that all iterations of our model is over-predicting in areas of low overdose risk and under-predicting in areas of high overdose risk.



Ethnicity Context

Census data tells us that Mesa, Arizona’s population is about 30% Hispanic. The table below displays the model results as it pertains to census tracts within the city that are majority Hispanic vs. majority White. Ideally, our model should under-predict in majority Hispanic areas and over-predict in majority White areas. We can confirm that our model is generalizing well despite ethnic variance across Census tract, and our output is favorable.

Regression Majority_Hispanic Majority_Non_Hispanic
Random k-fold CV: Just Risk Factors -0.1378604 0.0228891
Random k-fold CV: Spatial Process -0.1192920 0.0205719
Spatial LOGO-CV: Just Risk Factors -0.1242848 0.0209369
Spatial LOGO-CV: Spatial Process -0.1103633 0.0169298


Figure 13 is our final visual indicator of how well or model predicts future opioid overdose incidents. The map on the left displays the kernel density of 2019 observed opioid overdose incidents, overlayed with graduated symbols for observation points based on count. The map to the right displays the kernel density of our model’s predictions based on observed 2018 data, overlayed with the same points observed from 2019. We are able to see that our model predicts high density of overdose incidents in areas where there were in fact a high density of overdoses.
Ideally, in our Risk prediction vs Kernel density chart for 2019, we would see the risk prediction having equal to if not a higher rate than the kernel density of 2019. Although it is close, there is room for improvement in our model and figure 15 is a great indicator of that.




CONCLUSION

Finally, with our model’s predictions, we are able to spatially join the commercial and downtown zoned areas to the fishnet cells with the highest prediction count (figure 16.1). The result, depicted in figure 17, shows us which blocks in the city of Mesa will benefit most from the intervention of an OOTC. The darker color indicates which commercial zoning districts intersect with where our model predicted the highest amount of overdose incidents. The blue in surrounding areas are residential zoning districts with high prediction counts that should be a secondary option to the commercial and downtown districts.




Moving Forward


figure 18.2 figure 18 (imageS created via ArcGIS Pro)


Our model is performing well with a relatively low median average error, but of course there is always opportunity to improve our analysis. Our predictions are limited in accuracy due to the fact that we do not have access to exact location of observed opioid overdose incidents. With permission from the city to move forward on this project, we would respectively request unaggregated point data for the opioid overdose incidents to train our model. We would also request point data for existing health care facilities and schools, which would provide our model with valuable predictive features, and we would expect this data to improve our results. Lastly, it would be important for us to collect accurate neighborhood boundary data, so we may test the generalizability of our model across realistic neighborhood groups instead of using arbitrary Census Tract boundaries. Figure 18 illustrates the graphics conceptualized for the web application interface. Each property in a commercial or downtown zone that is a high risk category will be selectable. Ideally, we would add a filter which would include only addresses of commercial buildings that are vacant or have leases available so the user does not have to sift through the locations manually. Information on each property will be provided for city officials to make informed decisions about where the OOTC facilities should be placed.

We are thrilled to present this predictive model as an opportunity to improve the public health of Mesa during a time of urgent need, and hope the City of Mesa will consider implementing our tool to protect its citizens.



Olivia Scalora
Hasa Reddy


Code Appendix

#SETUP
knitr::opts_chunk$set(warning=FALSE)

library(tidyverse)
library(sf)
library(RSocrata)
library(ggmap)
library(viridis)
library(spatstat)
library(raster)
library(spdep)
library(FNN)
library(grid)
library(gridExtra)
library(knitr)
library(kableExtra)
library(tidycensus)
library(RColorBrewer)
library(ggthemes)
library(sp)
library(rgeos)
library(maptools)
library(dplyr)
library(units)
library(geojsonio)
library(cowplot)
library(ggcorrplot)

options(scipen=999)
options(tigris_class = "sf")

census_api_key("d5e25f48aa48bf3f0766baab06d59402ea032067", overwrite = TRUE) 

#LOAD FUNCTIONS

source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

#change cross validate function
crossValidate<- function(dataset, id, dependentVariable, indVariables) {
  
  allPredictions <- data.frame()
  cvID_list <- unique(dataset[[id]])
  
  for (i in cvID_list) {
    
    thisFold <- i
    cat("This hold out fold is", thisFold, "\n")
    
    fold.train <- filter(dataset, dataset[[id]] != thisFold) %>% as.data.frame() %>% 
      dplyr::select(id, geometry, indVariables, dependentVariable)
    fold.test  <- filter(dataset, dataset[[id]] == thisFold) %>% as.data.frame() %>% 
      dplyr::select(id, geometry, indVariables, dependentVariable)
    
    regression <-
      glm(countoverdose ~ ., family = "poisson", 
          data = fold.train %>% 
            dplyr::select(-geometry, -id))
    
    thisPrediction <- 
      mutate(fold.test, Prediction = predict(regression, fold.test, type = "response"))
    
    allPredictions <-
      rbind(allPredictions, thisPrediction)
    
  }
  return(st_sf(allPredictions))
}


mapTheme<- function(base_size = 12, title_size = 16) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = title_size,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    strip.text.x = element_text(size = 14),
    legend.key = element_rect(fill=NA))
}

g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
  }


#----CENSUS TRACTS, BLOCKGROUPS AND CITY BOUNDARY----

#CENSUS TRACTS 
tracts18 <- get_acs(geography = "tract", variables = c("B25026_001","B19013_001","B25058_001",
                                                       "B06012_002", "B25003_003", "B25003_002", "B25004_001" ), 
                    year=2018, state=04, county=013, geometry=T) %>% 
  st_transform('EPSG:2224')%>%
  filter(GEOID != '04013010102')%>%
  filter(GEOID != '04013941300')%>%
  filter(GEOID != '04013422308')%>%
  filter(GEOID != '04013422307')%>%
  filter(GEOID != '04013422508')%>%
  filter(GEOID != '04013422506')%>%
  filter(GEOID != '04013422622')%>%
  filter(GEOID != '04013318400')%>%
  filter(GEOID != '04013420100')%>%
  filter(GEOID != '04013810300')%>%
  filter(GEOID != '04013523104')%>%
  filter(GEOID != '04013420110')%>%
  filter(GEOID != '04013816900')%>%
  filter(GEOID != '04013815902')%>%
  filter(GEOID != '04013810800')%>%
  filter(GEOID != '04013815600')%>%
  filter(GEOID != '04013815200')%>%
  filter(GEOID != '04013319904')%>%
  filter(GEOID != '04013319402')%>%
  filter(GEOID != '04013319404')%>%
  filter(GEOID !="04013319906")

#CITY BOUNDARY
city_boundary <- st_as_sf(st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/City_Boundary.csv"), 
                          wkt = 'Geometry', crs = 4326, agr = 'constant')%>%
  st_transform(st_crs(tracts18))

#filter tracts within city boundary
mesa_tracts18 <- tracts18[city_boundary,]%>%
  dplyr::select( -NAME, -moe) %>%
  spread(variable, estimate) %>%
  dplyr::select(-geometry) %>%
  rename(TotalPop = B25026_001, 
         MedHHInc = B19013_001, 
         MedRent = B25058_001,
         TotalPoverty = B06012_002,
         TotalRent = B25003_003,
         TotalOwn = B25003_002,
         VacantUnits = B25004_001)

#BLOCK GROUPS
blockgroups18 <- get_acs(geography = "block group", variables = c("B01003_001","B19013_001","B25058_001",
                                                                  "B25003_003", "B25003_002", "B25004_001", "B25001_001",
                                                                  "B02001_002", "B03002_012"), 
                         year=2018, state=04, county=013, geometry=T) %>% st_transform('EPSG:2224')

mesa_bg18 <- blockgroups18[city_boundary,]%>%
  dplyr::select( -NAME, -moe) %>%
  spread(variable, estimate) %>%
  dplyr::select(-geometry) %>%
  rename(TotalPop = B01003_001, 
         MedHHInc = B19013_001, 
         MedRent = B25058_001,
         TotalRent = B25003_003,
         TotalOwn = B25003_002,
         VacantUnits = B25004_001,
         TotalUnits = B25001_001,
         TotalWhite = B02001_002,
         HispTotal = B03002_012)%>% 
  mutate(area = st_area(geometry),
         Pct.Vacant = VacantUnits/TotalUnits,
         Pct.White = TotalWhite/TotalPop,
         Pct.Hisp = HispTotal/TotalPop)%>%
  filter(GEOID != '040130101021')%>%
  filter(GEOID != '1040133184002')%>%
  filter(GEOID != '040139413002')%>%
  filter(GEOID != '040138169001')%>%
  filter(GEOID != '040139413004')%>%
  filter(GEOID != '040139413003')%>%
  filter(GEOID != '040133184002')

#----CENSUS VARIABLES----

#block groups with household income in the bottom quintile= low_income 
mesa_bg18_HHInc <-mesa_bg18 %>% drop_na(MedHHInc)%>%mutate(HHInc_q5 = q5(MedHHInc))
low_inc_BG <- mesa_bg18_HHInc%>%filter(HHInc_q5 == 1)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Low Income')

#block groups where median rent is in lower 2 quintile
mesa_bg18_rent <-mesa_bg18 %>% drop_na(MedRent)%>%mutate(MedRent_q5 = as.numeric(q5(MedRent)))
low_med_rent_BG <- mesa_bg18_rent%>%filter(MedRent_q5 <=2)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Low Rent')

#block groups where vacancy rate is higher than 25%
high_vacancy_bg <- mesa_bg18%>%filter(Pct.Vacant >= .25)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'High Vacancy')

#block groups where majority hispanic
hisp_bg <- mesa_bg18%>%filter(Pct.Hisp >= .5)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Majority Hispanic')

#block groups where vast majority white
white_bg <- mesa_bg18%>%filter(Pct.White >= .85)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Majority White')


#----OPIOID DATA----
opioid_data <- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/qufy-tzv6.json")), 
                        coords = c("longitude", "latitude"), 
                        crs = 4326, agr = "constant")%>%
               st_transform(st_crs(tracts18))%>%filter(location.longitude != -112.225)
opioid_data <- opioid_data%>%mutate(point = paste(opioid_data$location.latitude, opioid_data$location.longitude))

#count number of overdose occurances at each 1/3 mile interval point and join count column to opioid data set
count <- count(opioid_data,point)
opioid_data <- (st_join(opioid_data, count, all.x = all)%>%dplyr::select(-point.x,-point.y)%>%
  rename(count = n))

#filter points that are within city boundary
opioid_data <- opioid_data[city_boundary,]

opioid_data18 <- opioid_data%>%filter(year=='2018')


#----TRANSPORTATION----

light_rail <- st_read("data/Lightrail/LightRailStation.shp")%>%
  st_transform(st_crs(tracts18))%>%mutate(ID = 1:n(), Legend = "lightrail")%>%
  dplyr::select(ID, Legend,geometry)%>%st_centroid()
light_rail <- light_rail [city_boundary,]


 #----CRIME DATA----


crime <- read.socrata("https://data.mesaaz.gov/resource/39rt-2rfj.json")
crime <- st_as_sf(na.omit(crime), 
                        coords = c("longitude", "latitude"), 
                        crs = 4326, agr = "constant")%>%
          st_transform(st_crs(tracts18))%>%filter(report_year == '2018')
crime  <- crime[city_boundary,]

#MISDEMEANOR CRIMES
misdemeanor<- crime%>%
  filter(national_incident_based_crime_reporting_description %in% 
           c("DISORDERLY CONDUCT","SIMPLE ASSAULT","FALSE PRETENSES/SWINDLE/CONFIDENCE GAME","TRESPASS OF REAL PROPERTY",
              "CURFEW/LOITERING/VAGRANCY VIOLATION","POCKET-PICKING","DISORDERLY CONDUCT - DV","PURSE-SNATCHING",
              "FALSE PRETENSES/SWINDLE/CONFIDENCE","PEEPING TOM","SUSPICION","NOT REPOTABLE","NOT REPORTABLE","SHOPLIFTING",
              "INTIMIDATION","UNDER FALSE PRETENSES", "OTHER OFFENSES", "OTHER OFFENSE","ALL OTHER OFFENSES", "STOLEN PROPERTY OFFENSE", 
              "DESTRUCTION/DAMAGE/VANDALISM OF PROPERTY", "RUNAWAY", "COUNTERFEITING/FORGERY", "FAMILY OFFENSES, NONVIOLENT"))%>%
        mutate(Legend = 'misdemeanor')

#NON DRUG RELATED FELONY
felony<- crime%>%
  filter(!national_incident_based_crime_reporting_description %in% 
           c("DISORDERLY CONDUCT","SIMPLE ASSAULT","FALSE PRETENSES/SWINDLE/CONFIDENCE GAME","TRESPASS OF REAL PROPERTY",
             "CURFEW/LOITERING/VAGRANCY VIOLATION","POCKET-PICKING","DISORDERLY CONDUCT - DV","PURSE-SNATCHING",
             "FALSE PRETENSES/SWINDLE/CONFIDENCE","PEEPING TOM","SUSPICION","NOT REPOTABLE","NOT REPORTABLE","SHOPLIFTING",
             "INTIMIDATION","UNDER FALSE PRETENSES", "OTHER OFFENSES", "OTHER OFFENSE","ALL OTHER OFFENSES", "STOLEN PROPERTY OFFENSE", 
             "DESTRUCTION/DAMAGE/VANDALISM OF PROPERTY", "RUNAWAY", "COUNTERFEITING/FORGERY", "FAMILY OFFENSES, NONVIOLENT", 
             "DRUG/NARCOTIC VIOLATION","DRUG EQUIPMENT VIOLATION"))%>% mutate(Legend = 'felony')

# 
# ggplot()+geom_sf(data=city_boundary, fill=NA, color='black')+
#   geom_sf(data=felony, color = 'red')+geom_sf(data=misdemeanor, color = 'blue')+mapTheme()

#----CITY PROPERTIES----

#Parks 
parks<-na.omit(read.csv('data/Parks_Locations_And_Amenities.csv'))
parks<- st_as_sf(parks,coords = c("Longitude", "Latitude"),crs = 4326, agr = "constant")%>%
                         st_transform(st_crs(tracts18))%>%mutate(ID = 1:n(), Legend = "Park")%>%
                        dplyr::select(ID, Legend,geometry)
parks <- parks [city_boundary,]

#police and fire stations
mesa_police_fire<-st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                            coords = c("longitude", "latitude"),
                            crs = 4326, agr = "constant")%>%
                 st_transform(st_crs(tracts18))%>%
                 filter(property_use %in% c("Public Safety--Fire/Police", "Park/Public Safety"))%>%
                 mutate(ID = 1:n(),Legend = "Police&Fire Station")%>%
                 dplyr::select(ID,Legend,geometry)


#vacant lots
vacant<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                            coords = c("longitude", "latitude"),
                            crs = 4326, agr = "constant")%>%
         st_transform(st_crs(tracts18))%>%
         filter(property_use %in% c("Vacant", "Vacant (ADOT remnant)"))%>%
         mutate(ID = 1:n(), Legend = "Vacant Property")%>%
         dplyr::select(ID, Legend,geometry)


#child crisis center
cccenter<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                  coords = c("longitude", "latitude"),
                  crs = 4326, agr = "constant")%>%
          st_transform(st_crs(tracts18))%>%
          filter(property_use %in% "Child Crisis Center")%>%
          mutate(ID = 1:n(),Legend = "Child Crisis Center")%>%
          dplyr::select(ID, Legend,geometry)

#arts and education centerS
arts_edu<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                    coords = c("longitude", "latitude"),
                    crs = 4326, agr = "constant")%>%
           st_transform(st_crs(tracts18))%>%
           filter(property_use %in% c("Mesa Arts Center","Museums","Libraries","Sequoia Charter School"))%>%
           mutate(ID = 1:n(), Legend = "Arts and Education")%>%
           dplyr::select(ID, Legend,geometry)


#public housing
public_housing<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                    coords = c("longitude", "latitude"),
                    crs = 4326, agr = "constant")%>%
                st_transform(st_crs(tracts18))%>%
                filter(property_use %in% c("Housing", "Escobedo Housing", "NSP", "Residential Property"))%>%
                mutate(ID = 1:n(), Legend = "Public Housing")%>%
                dplyr::select(ID, Legend,geometry)


#----ZONING----

#High Density residential
HD_Residential <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts18))%>%
  filter(zoning %in% c("RM-3","RM-2","RM-4"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'HD')

#Low Density Residential- "single residence" zoning
LD_Residential <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts18))%>%
  filter(zoning %in% c("RS-15","RS-35","RS-43","RS-6","RS-90","RS-9","PC","RSL-2.5","RSL-4.5","RSL-2.5"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'LD')

#Commercial
Commercial <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts18))%>%
  filter(zoning %in% c("GC","LC","NC","OC","RSL-4.0"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'com')

#Downtown
Downtown <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts18))%>%
  filter(zoning %in% c("DC","DB-2","DB-1", "DR-2", "DR-3", "DR-1"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'DT')

#Industrial
Industrial <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts18))%>%
  filter(zoning %in% c("LI", "HI", "AG", "AG"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'ind')


zoning<- rbind(Commercial, HD_Residential, LD_Residential, Downtown, Industrial)
zoning$zone <- factor(zoning$zone, levels = c('HD','LD','DT','com','ind'))




#ZonePlot1
ggplot()+
  geom_sf(data= zoning,
          aes(fill= zone),
          color = NA,)+
  geom_sf(data = opioid_data18%>%distinct(geometry, .keep_all = TRUE),
          aes(size = count),
          color = '#f26847', alpha = .8)+
  scale_fill_manual(values = c("#4281a4","#9cafb7", "#f26847", "#f6ad85", "#c3c1b8"),
                    labels=c('High Density Res','Low Density Res','Downtown','Commercial','Industrial')) +
  scale_size_continuous(breaks=seq(1, 4, by=1),
                        range = c(1,9))+
  geom_sf(data=city_boundary,color = '#473126',size = .5,fill = NA)+
  guides(size = guide_legend(override.aes = list(size = c(1,3,6,9))))+
  labs(size = 'Count at nearest\n 1/3 mi. interval', fill = 'Zone Type',
       caption = 'figure 2.',
       title = 'Mesa, AZ: 2018 Opioid Overdoses',
       subtitle= 'Total: 415')+
   mapTheme()+theme(panel.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.background = element_rect(fill="#f0efeb", color = "#f0efeb"),
                   plot.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.text = element_text(color='#2b2005'),
                   legend.title = element_text(color = '#2b2005'),
                   plot.title = element_text(color = '#2b2005', size = 22),
                   panel.grid = element_blank())


 #----MAKE FISHNET----

fishnet <- 
  st_make_grid(city_boundary,
               cellsize = 1760, 
               square = TRUE) %>%
  .[city_boundary] %>%  
  st_sf() %>%
  mutate(uniqueID = rownames(.))

#join overdoses to the fishnet
opioid_net <- 
  dplyr::select(opioid_data18) %>% 
  mutate(countoverdose = 1) %>% 
  aggregate(., fishnet, sum) %>%
  mutate(countoverdose = replace_na(countoverdose, 0),
         uniqueID = rownames(.),
         cvID = sample(round(nrow(fishnet) / 24), size=nrow(fishnet), replace = TRUE))
#opioid fishnet plot
ggplot() +
  geom_sf(data = opioid_net, aes(fill = countoverdose), color= NA)+
  scale_fill_viridis(option = 'F', direction = -1) +
  labs(title = "figure 3: Count of Overdoses for the fishnet",
       subtitle = "Mesa, AZ") +
  mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))


#----JOIN ZONING DATA TO FISHNET----

#Industrial
#Create polygon centroid
Ind_center<- st_centroid(Industrial)%>%mutate(Legend = "Industrial")


#Commercial
#Create polygon centroid
Comm_center<- st_centroid(Commercial)%>%mutate(Legend = "Commercial")


#low density residential
#Create polygon centroid
LDR_center<- st_centroid(LD_Residential)%>%mutate(Legend = "Low Density Residential")


# high density residential polygons 
#Create polygon centroid
HDR_center<- st_centroid(HD_Residential)%>%mutate(Legend = "High Density Residential")

#Create polygon centroid
DT_center<- st_centroid(Downtown)%>%mutate(Legend = "Downtown")

#join zones to fishnet + plot
zone_vars_net <- 
  rbind(DT_center, HDR_center, LDR_center, Comm_center, Ind_center) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID, Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

zone_vars_net.long <- 
  gather(zone_vars_net, Variable, value, -geometry, -uniqueID)
zone_vars <- unique(zone_vars_net.long$Variable)
zone_mapList <- list()

for(i in zone_vars){
  zone_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(zone_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}


do.call(grid.arrange,c(zone_mapList, ncol=2, top="figure 4.1: Density of Zone Type by Fishnet"))


#----JOIN CRIME DATA TO FISHNET----
crime_vars_net <- 
  rbind(misdemeanor, felony) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID,Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

crime_vars_net.long <- 
  gather(crime_vars_net, Variable, value, -geometry, -uniqueID)
crime_vars <- unique(crime_vars_net.long$Variable)
crime_mapList <- list()

for(i in crime_vars){
  crime_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(crime_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}

do.call(grid.arrange,c(crime_mapList, ncol=2, top="figure 4.2: CrimeRisk Factor by Fishnet"))


#----JOIN POINT VARIABLES TO FISHNET----

#Point data to fishnet
point_vars_net <- 
  rbind(arts_edu,cccenter,mesa_police_fire,parks,public_housing,vacant,parks, light_rail) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID,Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

point_vars_net.long <- 
  gather(point_vars_net, Variable, value, -geometry, -uniqueID)
point_vars <- unique(point_vars_net.long$Variable)
point_mapList <- list()

for(i in point_vars){
  point_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(point_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}


do.call(grid.arrange,c(point_mapList, ncol=2, top="figure 4.3: Risk Factor by Fishnet"))


 #----JOINING CENSUS VARIABLES TO FISHNET----

#Point data to fishnet
census_vars_net<- 
  rbind(high_vacancy_bg, low_med_rent_BG, low_inc_BG, white_bg, hisp_bg) %>%
  st_join(fishnet, ., join=st_intersects) %>%
  st_drop_geometry() %>%
  group_by(uniqueID,Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

census_vars_net.long <- 
  gather(census_vars_net, Variable, value, -geometry, -uniqueID)
census_vars <- unique(census_vars_net.long$Variable)
census_mapList <- list()

for(i in census_vars){
  census_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(census_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}

#GRID SHOWS how many block groups each cell of fishnet intersects with that are defined as each variable. 
#max interesection is 4,5,and 6. 

do.call(grid.arrange,c(census_mapList, ncol=2, top="figure 4.4: Census Variable by Fishnet"))


#----JOIN POINT NN VARIABLES TO FISHNET----

#join nearest neighbor to fishnet
point_vars_net.nn <-
  point_vars_net %>%
  mutate(
    Park.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(parks),3),
    Child.Crisis.Center.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(cccenter),3),
    Arts.and.Education.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(arts_edu),3),
    Police.Fire.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(mesa_police_fire),3),
    Public.Housing.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(public_housing),3),
    Vacant.Property.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(vacant),3),
    Lightrail.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(light_rail),3))%>%
  dplyr::select(ends_with(".nn"),uniqueID, geometry)

point_vars_net.nn.long <- 
  gather(point_vars_net.nn, Variable, value, -geometry, -uniqueID)
point_vars.nn <- unique(point_vars_net.nn.long$Variable)
point_mapList.nn <- list()

for(i in point_vars.nn){
  point_mapList.nn[[i]] <- 
    ggplot() +
    geom_sf(data = filter(point_vars_net.nn.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}

do.call(grid.arrange,c(point_mapList.nn, ncol=2, top="figure 4.5: Nearest Neighbor Risk Factors by Fishnet"))

 #----JOINING CENSUS VARIABLES TO FISHNET----


census_vars_net.nn <-
  census_vars_net %>%
  mutate(
    high_vacancy.nn =
      nn_function(st_coordinates(st_centroid(census_vars_net)), st_coordinates(st_centroid(high_vacancy_bg)),3),
    low_med_rent.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(low_med_rent_BG)),3),
    low_inc.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(low_inc_BG)),3),
    maj.hisp.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(hisp_bg)),3),
    maj.white.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(white_bg)),3))%>%
  dplyr::select(ends_with(".nn"),uniqueID, geometry)

census_vars_net.nn.long <- 
  gather(census_vars_net.nn, Variable, value, -geometry, -uniqueID)
census_vars.nn <- unique(census_vars_net.nn.long$Variable)
census_mapList.nn <- list()

for(i in census_vars.nn){
  census_mapList.nn[[i]] <- 
    ggplot() +
    geom_sf(data = filter(census_vars_net.nn.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}
do.call(grid.arrange,c(census_mapList.nn, ncol=2, top="figure 4.6: Census Variable Nearest Neighbor by Fishnet"))

#----MERGE ALL VARIABLES TO FISHNET----

vars_net<- cbind(point_vars_net,(st_drop_geometry(zone_vars_net)%>%
                    dplyr::select(-uniqueID)))
vars_net<- cbind(vars_net,(st_drop_geometry(point_vars_net.nn))%>%
                   dplyr::select(-uniqueID))
vars_net<- cbind(vars_net,(st_drop_geometry(census_vars_net))%>%
                   dplyr::select(-uniqueID))
vars_net<- cbind(vars_net,(st_drop_geometry(census_vars_net.nn))%>%
                    dplyr::select(-uniqueID))
vars_net<- cbind(vars_net,(st_drop_geometry(crime_vars_net))%>%
                   dplyr::select(-uniqueID))

final_net <-
  left_join(opioid_net, st_drop_geometry(vars_net), by="uniqueID") 

final_net <-
  st_centroid(final_net) %>%
  st_join(dplyr::select(mesa_tracts18, GEOID), by = "uniqueID") %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(final_net, geometry, uniqueID)) %>%
  st_sf() %>%
  na.omit()


## generates warnings from PROJ issues
## {spdep} to make polygon to neighborhoods... 
final_net.nb <- poly2nb(as_Spatial(final_net), queen=TRUE)
## .. and neighborhoods to list of weigths
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE)
#print(final_net.weights, zero.policy=TRUE)
## see ?localmoran
local_morans <- localmoran(final_net$countoverdose, final_net.weights, zero.policy=TRUE) %>% 
  as.data.frame()


# join local Moran's I results to fishnet

final_net.localMorans <- 
  cbind(local_morans, as.data.frame(final_net)) %>% 
  st_sf() %>%
  dplyr::select(overdose_count = countoverdose, 
                Local_Morans_I = Ii, 
                P_Value = `Pr(z != E(Ii))`) %>%
  mutate(Significant_Hotspots = ifelse(P_Value <= 0.001, 1, 0)) %>%
  gather(Variable, Value, -geometry)

vars <- unique(final_net.localMorans$Variable)
varList <- list()

for(i in vars){
  varList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(final_net.localMorans, Variable == i), 
            aes(fill = Value), colour=NA) +
    scale_fill_viridis(option = "F", direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
  }

final_net <- final_net %>% 
  mutate(overdose.isSig = 
           ifelse(local_morans[,5] <= 0.001, 1, 0)) %>%
  mutate(overdose.isSig.dist = 
           nn_function(st_coordinates(st_centroid(final_net)),
                       st_coordinates(st_centroid(filter(final_net, 
                                           overdose.isSig == 1))),k = 1))
final_net.long <- 
  gather(final_net, Variable, value, -geometry, -uniqueID)


do.call(grid.arrange,c(varList, ncol = 2, top = "figure 5.1: Local Morans I statistics, Mesa, AZ Overdose Incidents"))
ggplot() +
      geom_sf(data = filter(final_net.long, Variable == "overdose.isSig.dist"), 
              aes(fill=as.numeric(value)), colour=NA) +
      scale_fill_viridis(option = 'F', direction = -1) +
      labs(fill = 'Distance (ft)',
           title= 'figure 5.2: Distance to Significant 2018 Overdose Hotspot',
           subtitle = 'Mesa, AZ') +
      mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))

#----CORRELATION PLOTS----

#### for counts
correlation.long <-
  st_drop_geometry(final_net) %>%
  dplyr::select(-uniqueID, -cvID, -GEOID) %>%
  gather(Variable, Value, -countoverdose)

##### for nn
correlation.cor <-
  correlation.long %>%
  group_by(Variable) %>%
  summarize(correlation = cor(Value, countoverdose, use = "complete.obs"))

ggplot(correlation.long, aes(Value, countoverdose)) +
  geom_point(size = 1, color = '#223843') +
  geom_text(data = correlation.cor, aes(label = paste("r =", round(correlation, 2))),
            x=-Inf, y=Inf, vjust = 1.5, hjust = -.1) +
  geom_smooth(method = "lm", se = FALSE, colour = "black") +
  facet_wrap(~Variable, ncol = 5, scales = "free") +
  labs(title = "Narcotics Incidents count as a function of risk factors",
       caption = "figure 6.1") +
  plotTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

numericVars <-
  select_if(st_drop_geometry(final_net), is.numeric) %>% na.omit()

ggcorrplot(
  round(cor(numericVars), 1),
  p.mat = cor_pmat(numericVars),
  colors = c("#40042d", "#f0efeb", "#f7580f"),
  type="lower",
  insig = "blank") +
  labs(title = "Correlation across numeric variables", 
       caption = 'figure 6.2')+theme(panel.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.background = element_rect(fill="#f0efeb", color = "#f0efeb"),
                   plot.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.text = element_text(color='#2b2005'),
                   legend.title = element_text(color = '#2b2005'),
                   plot.title = element_text(color = '#2b2005', size = 22),
                   panel.grid = element_blank(),
                   plot.margin= margin(1,1,1,1, unit = 'cm'))

final_net %>%
  ggplot(aes(countoverdose,))+
  geom_histogram(bins = 10, colour="black", fill = '#f69c73') +
  scale_x_continuous(breaks = seq(0, 20, by = 2)) + 
  scale_y_continuous(breaks = seq(0,1500, by = 200))+
  labs(title="Distribution of Overdose Occurances", subtitle = "Mesa, AZ",
       x="Overdose Occurance Count", y="Frequency", 
       caption = "figure 7.") +plotTheme()+theme(panel.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.background = element_rect(fill="#f0efeb", color = "#f0efeb"),
                   plot.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   panel.border = element_blank(),
                   legend.text = element_text(color='#2b2005'),
                   legend.title = element_text(color = '#2b2005'),
                   plot.title = element_text(color = '#2b2005', size = 22),
                   panel.grid = element_blank())
## define the variables we want

#Just Risk Factors
reg.vars <- c("Commercial", "High.Density.Residential", "Low.Density.Residential", 
                 "Park.nn", "Arts.and.Education.nn", "Police.Fire.nn", "Public.Housing", "Vacant.Property", "Low.Income", "Low.Rent",
                 "Majority.White", "Majority.Hispanic","Child.Crisis.Center", "Industrial", "misdemeanor", "felony", "lightrail")

#Include Local Moran's I Statistic
reg.ss.vars <- c("Commercial", "High.Density.Residential", "Low.Density.Residential", 
                 "Park.nn", "Arts.and.Education.nn", "Police.Fire.nn", "Public.Housing", "Vacant.Property", "Low.Income", "Low.Rent",
                 "Majority.White", "Majority.Hispanic","Child.Crisis.Center", "Industrial", "misdemeanor", "felony","lightrail", "overdose.isSig.dist", "overdose.isSig")

# "Downtown", 
## RUN REGRESSIONS

#### K-fold
#Just Risk Factors
reg.CV <- crossValidate(
  dataset = final_net,
  id = "cvID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.vars) %>%
  dplyr::select(cvID, countoverdose, Prediction, geometry)

#Spatial Process
reg.ss.CV <- crossValidate(
  dataset = final_net,
  id = "cvID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.ss.vars) %>%
  dplyr::select(cvID, countoverdose, Prediction, geometry)


#### Spatial LOGO-CV
#Just Risk Factors
reg.spatialCV <- crossValidate(
  dataset = final_net,
  id = "GEOID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.vars) %>%
  dplyr::select(cvID = GEOID, countoverdose, Prediction, geometry)


#Spatial Process
reg.ss.spatialCV <- crossValidate(
  dataset = final_net,
  id = "GEOID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.ss.vars) %>%
  dplyr::select(cvID = GEOID, countoverdose, Prediction, geometry)

#### bind

reg.summary <- 
  rbind(
    mutate(reg.CV,        Error = Prediction - countoverdose,
           Regression = "Random k-fold CV: Just Risk Factors"),
    mutate(reg.ss.CV,        Error = Prediction - countoverdose,
           Regression = "Random k-fold CV: Spatial Process"),
    mutate(reg.spatialCV, Error = Prediction - countoverdose,
           Regression = "Spatial LOGO-CV: Just Risk Factors"),
    mutate(reg.ss.spatialCV, Error = Prediction - countoverdose,
           Regression = "Spatial LOGO-CV: Spatial Process")) %>%
  st_sf() 

# calculate errors by Block Group
error_by_reg_and_fold <- 
  reg.summary  %>%
  group_by(Regression, cvID) %>% 
  summarize(Mean_Error = mean(Prediction - countoverdose, na.rm = T),
            MAE = mean(abs(Mean_Error), na.rm = T),
            SD_MAE = mean(abs(Mean_Error), na.rm = T)) %>%
  ungroup()



#----HISTOGRAMS----
error_by_reg_and_fold %>% ggplot(aes(MAE))+
  geom_histogram(bins=30, color = 'black', fill = '#f69c73')+
  facet_wrap(~Regression)+
  geom_vline(xintercept= 0)+scale_x_continuous(breaks=seq(0,8,by=1))+
  labs(title='Distribution of MAE', subtitle='k-fold cross validation vs. LOGO-CV',
       x= 'Mean Absolute Erorr', y = 'Count', caption = 'figure 8.')+plotTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

#-----SPATIAL PLOTS-----

ggplot() +
  geom_sf(data = reg.summary, aes(fill = Prediction, colour = Prediction)) +
  scale_fill_viridis(option = "F", direction = -1) +
  scale_colour_viridis(option = "F", direction = -1) +
  facet_wrap(~Regression) +  
  labs(title="Narcotics Incidents Errors", subtitle = "Random K-Fold and Spatial Cross Validation", caption = "figure 9.") +
  mapTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))
st_drop_geometry(error_by_reg_and_fold) %>%
  group_by(Regression) %>% 
  summarize(Mean_MAE = round(mean(MAE), 2),
            SD_MAE = round(sd(MAE), 2)) %>%
  kable() %>%kable_styling(html_font = 'helvetica')%>%
  kable_paper()%>%column_spec(c(1,2,3), background = "#f0efeb")%>%row_spec(0, background = "#f0efeb", color= 'black',hline_after = TRUE)


#----NEIGHBORHOOD WEIGHTS-----
  
neighborhood.weights <-
  filter(error_by_reg_and_fold, Regression == "Spatial LOGO-CV: Spatial Process") %>%
  group_by(cvID) %>%
  poly2nb(as_Spatial(.), queen=TRUE) %>%
  nb2listw(., style="W", zero.policy=TRUE)

filter(error_by_reg_and_fold, str_detect(Regression, "LOGO"))  %>% 
  st_drop_geometry() %>%
  group_by(Regression) %>%
  summarize(Morans_I = moran.mc(abs(Mean_Error), neighborhood.weights, 
                                nsim = 999, zero.policy = TRUE, 
                                na.action=na.omit)[[1]],
            p_value = moran.mc(abs(Mean_Error), neighborhood.weights, 
                               nsim = 999, zero.policy = TRUE, 
                               na.action=na.omit)[[3]])%>%
  kable() %>%kable_styling(html_font = 'helvetica')%>%
  kable_paper()%>%column_spec(c(1,2,3), background = "#f0efeb")%>%row_spec(0, background = "#f0efeb", color= 'black',hline_after = TRUE)


grid.arrange(
  reg.summary%>%filter(Regression == "Spatial LOGO-CV: Just Risk Factors")%>%
    ggplot() + geom_sf(aes(fill = Prediction, colour = Prediction))+
    scale_fill_viridis(option = "F", direction = -1) +
    scale_colour_viridis(option = "F", direction = -1)+
    labs(title='Spatial LOGO-CV: \nJust Risk Factors',
         caption = 'figure 11.')+mapTheme()+theme(panel.border=element_blank(),
                                                      title = element_text(size = 12),
                                                                      legend.position = "bottom",
                                                                      legend.key.size = unit(.5, 'cm')),
  reg.summary%>%filter(Regression == "Spatial LOGO-CV: Spatial Process")%>%
    ggplot() + geom_sf(aes(fill = Prediction, colour = Prediction))+
    scale_fill_viridis(option = "F", direction = -1) +
    scale_colour_viridis(option = "F", direction = -1)+

    labs(title='Spatial LOGO-CV: \nSpatial Process')+mapTheme()+theme(panel.border=element_blank(),
                                                      title = element_text(size = 12),
                                                                    legend.position = "bottom",
                                                                    legend.key.size = unit(.5, 'cm')),
  ggplot() +
    geom_sf(data = reg.summary, aes(fill = countoverdose, colour = countoverdose))+
    scale_fill_viridis(option = "F", direction = -1) +
    scale_colour_viridis(option = "F", direction = -1)+
    labs(title='Observed \nOverdoses')+mapTheme()+theme(panel.border=element_blank(),
                                                      title = element_text(size = 12),
                                                      legend.position = "bottom",
                                                      legend.key.size = unit(.5, 'cm')), nrow=1)

st_drop_geometry(reg.summary) %>%
  group_by(Regression) %>%
  mutate(overdose_Decile = ntile(countoverdose, 10)) %>%
  group_by(Regression, overdose_Decile) %>%
  summarize(meanObserved = mean(countoverdose, na.rm=T),
            meanPrediction = mean(Prediction, na.rm=T)) %>%
  gather(Variable, Value, -Regression, -overdose_Decile) %>%          
  ggplot(aes(overdose_Decile, Value, shape = Variable)) +
  geom_point(size = 2) + geom_path(aes(group = overdose_Decile), colour = "black") +
  scale_shape_manual(values = c(2, 17)) +
  facet_wrap(~Regression) + xlim(0,10) +
  labs(title = "Predicted and observed overdose by observed overdose decile",
       caption = 'figure 12.')  +
  plotTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

#Test  generalizability across block group
#Get 2018 Data
blockgroups19 <- get_acs(geography = "block group", variables = c("B01003_001","B03002_012"), 
                         year=2019, state=04, county=013, geometry=T) %>% 
  st_transform('EPSG:2224')  %>% 
  dplyr::select(variable, estimate, GEOID) %>%
  spread(variable, estimate) %>%
  rename(TotalPop = B01003_001,
         HispTotal = B03002_012) %>%
  mutate(Pct.Hisp = HispTotal/TotalPop,
         raceContext = ifelse(Pct.Hisp > .5, "Majority_Hispanic", "Majority_Non_Hispanic")) %>%
  .[mesa_tracts18,]


reg.summary %>%
  st_centroid() %>%
  st_join(blockgroups19) %>%
  na.omit() %>%
  st_drop_geometry() %>%
  group_by(Regression, raceContext) %>%
  summarize(mean.Error = mean(Error, na.rm = T)) %>%
  spread(raceContext, mean.Error) %>%
  kable() %>%kable_styling(html_font = 'helvetica')%>%
  kable_paper()%>%column_spec(c(1,2,3), background = "#f0efeb")%>%row_spec(0, background = "#f0efeb", color= 'black',hline_after = TRUE)

# In this section, we ask whether risk predictions outperform traditional ‘Kernel density’ hotspot mapping. 
# To add an element of across-time generalizability, 
# hotspot and risk predictions from these 2017 burglaries are used to predict the location of burglaries from 2018.

#Goodness of fit indicator - illustrate whether the 2017 kernel density or risk predictions capture
#more of the 2018 burglaries. If the risk predictions capture more observed burglaries than the kernel density,
#then the risk prediction model provides a more robust targeting tool for allocating police resources.

#Step1 - 2018 opioid data 
opioid_data19 <- opioid_data%>%filter(year=='2019')

# Step 2
#Kernel Density for 2017 data
opioid_ppp <- as.ppp(st_coordinates(opioid_data), W = st_bbox(final_net))
opioid_KD <- density.ppp(opioid_ppp, 1000)

opioid_KDE_sf <- as.data.frame(opioid_KD) %>%
  st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%
  aggregate(., final_net, mean) %>%
  mutate(label = "Kernel Density",
         Risk_Category = ntile(value, 100),
         Risk_Category = case_when(
           Risk_Category >= 90 ~ "90% to 100%",
           Risk_Category >= 70 & Risk_Category <= 89 ~ "70% to 89%",
           Risk_Category >= 50 & Risk_Category <= 69 ~ "50% to 69%",
           Risk_Category >= 30 & Risk_Category <= 49 ~ "30% to 49%",
           Risk_Category >= 1 & Risk_Category  <= 29 ~ "1% to 29%")) %>%
  cbind(
    aggregate(
      dplyr::select(opioid_data19) %>% mutate(overdoseCount = 1), ., sum) %>%
      mutate(overdoseCount = replace_na(overdoseCount, 0))) %>%
  dplyr::select(label, Risk_Category, overdoseCount)

# Step 3
opioid_risk_sf <-
  filter(reg.summary, Regression == "Spatial LOGO-CV: Spatial Process") %>%
  mutate(label = "Risk Predictions",
         Risk_Category = ntile(Prediction, 100),
         Risk_Category = case_when(
           Risk_Category >= 90 ~ "90% to 100%",
           Risk_Category >= 70 & Risk_Category <= 89 ~ "70% to 89%",
           Risk_Category >= 50 & Risk_Category <= 69 ~ "50% to 69%",
           Risk_Category >= 30 & Risk_Category <= 49 ~ "30% to 49%",
           Risk_Category >= 1 & Risk_Category <= 29 ~ "1% to 29%")) %>%
  cbind(
    aggregate(
      dplyr::select(opioid_data19) %>% mutate(overdoseCount = 1), ., sum) %>%
      mutate(overdoseCount = replace_na(overdoseCount, 0))) %>%
  dplyr::select(label, Risk_Category, overdoseCount)

# Step 4
rbind(opioid_KDE_sf , opioid_risk_sf) %>%
  na.omit() %>%
  gather(Variable, Value, -label, -Risk_Category, -geometry) %>%
  ggplot() +
  geom_sf(aes(fill = Risk_Category), colour = NA) +
  geom_sf(data = opioid_data19%>%distinct(geometry, .keep_all=TRUE),
          aes(size = count),
          color = '#000000',
          alpha = .65)+
  scale_size_continuous(breaks=seq(1, 75, by=15),
                        range = c(1,6))+
  guides(size= guide_legend(title = "2019 Overdoses",override.aes = list(size = c(1,2,4,5,6),color = '#424453',alpha=.45)))+
  facet_wrap(~label, ) +
  scale_fill_viridis(option = "F", direction =-1, discrete = TRUE, alpha = .75) +
  labs(title="Comparison of Kernel Density and Risk Predictions",
       subtitle="2018 Overdose Risk Predictions; 2019 Overdose", caption = "figure 13.") +
  mapTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

opioid_data19 <- opioid_data%>%filter(year=='2019')

#kernel density plot 2017-2021
p1<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data18)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  geom_sf(data = opioid_data18%>%distinct(geometry, .keep_all=TRUE),
          aes(size = count),
          color = '#fffaf2',
          alpha = .5)+
  guides(size = guide_legend(override.aes = list(color = '#2f273d', alpha=.2)))+
  labs(subtitle = '2018',
       fill = 'Density',
       size = 'Count at nearest\n 1/3 mi. interval',
       caption = 'figure 14.')+
  scale_size_continuous(breaks=seq(1, 61, by=15),
                        range = c(1,10))+ mapTheme()+theme(legend.key.size = unit(.65, 'cm'))

p2<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data19)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  geom_sf(data = opioid_data19%>%distinct(geometry, .keep_all=TRUE),
          aes(size = count),
          color = '#fffaf2',
          alpha = .5)+
  scale_size_continuous(breaks=seq(1, 61, by=15),
                        range = c(1,10))+ 
  guides(size = guide_legend(override.aes = list(color = '#2f273d', alpha=.2)))+
  labs(subtitle = '2019',
       fill = 'Density',
       size = 'Count at nearest\n 1/3 mi. interval')+
  mapTheme()

p3<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data18)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  labs(subtitle = 'Kernel Density 2018')+ mapTheme()

p4<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data19)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  labs(subtitle = 'Kernel Density 2019')+ mapTheme()

mylegend<-g_legend(p1)

grid.arrange(p3 + theme(legend.position="none"),
             p4 + theme(legend.position="none"),
             mylegend, 
             p1 + theme(legend.position="none"),
             p2 + theme(legend.position="none"),
             nrow=2,ncol=3,widths=c(2,2,.5), top = "Opioid Overdose Occurances in Mesa, AZ")
  

#calculates the rate of 2018 overdose points by risk category and model type
# well fit model should show that the risk predictions capture a greater share of 2018 overdoses
# in the highest risk category relative to the kernel density

rbind(opioid_KDE_sf , opioid_risk_sf) %>%
  st_set_geometry(NULL) %>% na.omit() %>%
  gather(Variable, Value, -label, -Risk_Category) %>%
  group_by(label, Risk_Category) %>%
  summarize(countoverdose = sum(Value)) %>%
  ungroup() %>%
  group_by(label) %>%
  mutate(Rate_of_test_set_overdose = countoverdose / sum(countoverdose)) %>%
  ggplot(aes(Risk_Category,Rate_of_test_set_overdose)) +
  geom_bar(aes(fill=label), position="dodge", stat="identity") +
  scale_fill_manual(values=c('#f69c73', "#721f58"))+
  labs(title = "Risk prediction vs. Kernel density, 2019 overdoses",
       caption = 'figure 15.') +
  plotTheme() + theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

# JOIN PREDICTIONS TO ZONE TYPE
reg.summary.logo<-reg.summary%>% filter(Regression == "Spatial LOGO-CV: Spatial Process")%>%
  mutate(abs_Error = abs(Error))

grid.arrange(
#LOGO CV Spatial Predictions
reg.ss.spatialCV%>%
  ggplot() + geom_sf(aes(fill = Prediction, colour = Prediction))+
  scale_fill_viridis(option = "F", direction = -1) +
  scale_colour_viridis(option = "F", direction = -1)+
  labs(title='Predictions',subtitle='Spatial LOGO-CV: Spatial Process', caption = 'figure 16.1')+
  mapTheme()+theme(panel.border=element_blank(),legend.position = "bottom",legend.key.size = unit(.5, 'cm')),

#LOGO CV Spatial Errors
ggplot() +
  geom_sf(data = reg.summary.logo, aes(fill = abs_Error, colour = abs_Error)) +
  scale_fill_viridis(option = "F", direction = -1) +
  scale_colour_viridis(option = "F", direction = -1) +
  labs(title = 'Absolute Errors', subtitle='Spatial LOGO-CV: Spatial Process', caption = 'figure 16.2') +
  mapTheme()+theme(panel.border=element_blank(),legend.position = "bottom",legend.key.size = unit(.5, 'cm')), nrow=1)

#join downtown and commercial zoning districts

site<- rbind(Commercial, Downtown)
site.second<- rbind(HD_Residential, LD_Residential)
# reg.ss.spatialCV%>%
#   ggplot() +
#   geom_sf(data=site, fill=NA, color = 'black', size = .5, alpha = .5)+ 
#   geom_sf(aes(fill = Prediction, colour = Prediction), alpha=.5)+
#   geom_sf(data=city_boundary, fill = NA)+
#   scale_fill_viridis(option = "F", direction = -1) +
#   scale_colour_viridis(option = "F", direction = -1)+
#   labs(title='Predictions',subtitle='Spatial LOGO-CV: Spatial Process')+
#   mapTheme()+theme(panel.border=element_blank(),legend.position = "bottom",legend.key.size = unit(.5, 'cm'))
OOTC.site<- st_join(site, reg.ss.spatialCV, join=st_intersects)%>%filter(Prediction > 1)
OOTC.site.second<- st_join(site.second, reg.ss.spatialCV, join=st_intersects)%>%filter(Prediction > 1)

ggplot() +
  geom_sf(data= zoning,
          fill= 'grey90',
          color = NA,)+
  geom_sf(data =city_boundary, fill = NA, color ='black')+
  # geom_sf(data = mesa_tracts17, fill = NA, color ='grey75', size = .25)+
  geom_sf(data=OOTC.site, aes(fill = Prediction), color =NA)+
  scale_fill_viridis(option = "F", direction = -1) +
  geom_sf(data=OOTC.site.second, fill = 'lightblue', color =NA)+
  labs(title = 'Commercial and Downtown Zones with High Prediction Value', caption = 'figure 17.') +
  mapTheme()