::p_load(sf, spdep, tmap, tidyverse) pacman
Hands-on Exercise 2: Global Measures of Spatial Autocorrelation
Overview
In this hands-on exercise, I learned how to compute Global and Local Measure of Spatial Autocorrelation (GLSA) via spdep package.
import geospatial data using appropriate function(s) of sf package,
import csv file using appropriate function of readr package,
perform relational join using appropriate join function of dplyr package,
compute Global Spatial Autocorrelation (GSA) statistics by using appropriate functions of spdep package,
plot Moran scatterplot,
compute and plot spatial correlogram using appropriate function of spdep package.
compute Local Indicator of Spatial Association (LISA) statistics for detecting clusters and outliers by using appropriate functions spdep package;
compute Getis-Ord’s Gi-statistics for detecting hot spot or/and cold spot area by using appropriate functions of spdep package; and
to visualise the analysis output by using tmap package.
Study Area & Data
Hunan province administrative boundary layer at county level. This is a geospatial data set in ESRI shapefile format.
Hunan_2012.csv: This csv file contains selected Hunan’s local development indicators in 2012.
Getting Started
The code chunk below installs and load sf and tidyverse packages into R environment
Getting the Data Into R Environment
Bring geospatial data (ESRI shapefile) and its associated attribute (csv) into R environment.
Import shapefile into R environment
st_read() is used to import Hunan shapefile into R. The imported shapefile will be simple features Object of sf.
<- st_read(dsn = "data/geospatial",
hunan layer = "Hunan")
Reading layer `Hunan' from data source
`C:\lnealicia\ISSS624\Handson_ex02\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS: WGS 84
Import csv file into r environment
Import Hunan_2012.csv into R using read_csv(), the output is a R dataframe class.
<- read_csv("data/aspatial/Hunan_2012.csv") hunan2012
Relational Join
Update the attribute table of hunan’s SpatialPolygonsDataFrame with attribute fields of hunan2012 dataframe via leftjoin() of dplyr package.
<- left_join(hunan,hunan2012)%>%
hunan select(1:4, 7, 15)
Visualising Regional Development Indicator
Prepare a basemap and a choropleth map showing the distribution of GDPPC 2012 by using qtm() of tmap package.
<- tm_shape(hunan) +
equal tm_fill("GDPPC",
n = 5,
style = "equal") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal interval classification")
<- tm_shape(hunan) +
quantile tm_fill("GDPPC",
n = 5,
style = "quantile") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal quantile classification")
tmap_arrange(equal,
quantile, asp=1,
ncol=2)
Computing Contiguity Spatial Weights
Use poly2nb() of spdep package to compute contiguity weight matrices for the study area. This function builds a neighbours list based on regions with contiguous boundaries. Note: A “queen” argument taking TRUE/FALSE option can be passed. Default is set to TRUE.
Global Spatial Autocorrelation
Compute global spatial autocorrelation statistics and to perform spatial complete randomness test for global spatial autocorrelation.
Computing Contiguity Spatial Weights
Construct a spatial weights of the study area to compute the global spatial autocorrelation statistics
poly2nb() of spdep package is used to compute contiguity weight matrices for the study area.
<- poly2nb(hunan, queen=TRUE)
wm_q summary(wm_q)
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Link number distribution:
1 2 3 4 5 6 7 8 9 11
2 2 12 16 24 14 11 4 2 1
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links
The summary report above shows that there are 88 area units in Hunan. The most connected area unit has 11 neighbours. There are two area units with only one neighbours.
Row-standardised weights matrix
Assign weights to each neighboring polygon. Below, each neighboring polygon will be assigned equal weight (style=“W”).
This is done via assigning the fraction 1/ (# of neighbours) to each neighbouring country then adding up the weighted income values.
Drawback: Polygons along the edges of the study area will base their lagged values on fewer polygons, thus potentially over- or under-estimating the true nature of the spatial autocorrelation in the data.
<- nb2listw(wm_q, style="W", zero.policy = TRUE)
rswm_q rswm_q
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 88 7744 88 37.86334 365.9147
The input of nb2listw() must be an object of class nb. The syntax of the function has two major arguments, namely style and zero.poly.
Global Spatial Autocorrelation: Moran’s I
Perform Moran’s I statistics testing by using moran.test() of spdep.
moran.test(hunan$GDPPC,
listw=rswm_q,
zero.policy = TRUE,
na.action=na.omit)
Moran I test under randomisation
data: hunan$GDPPC
weights: rswm_q
Moran I statistic standard deviate = 4.7351, p-value = 1.095e-06
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.300749970 -0.011494253 0.004348351
Conclusion: Given the small p-value < 0.05, we reject the null hypothesis of spatial randomness. Therefore, there is strong evidence that the variable hunan$GDPPC
exhibits positive spatial autocorrelation in relation to the specified spatial weights. Regions with similar values of hunan$GDPPC
are clustered together, suggesting a spatial pattern in the distribution of this variable.
Computing Monte Carlo Moran’s I
Perform permutation test for Moran’s I statistic by using moran.mc() of spdep. A total of 1000 simulation will be performed.
set.seed(1234)
= moran.mc(hunan$GDPPC,
bpermlistw=rswm_q,
nsim=999,
zero.policy = TRUE,
na.action=na.omit)
bperm
Monte-Carlo simulation of Moran I
data: hunan$GDPPC
weights: rswm_q
number of simulations + 1: 1000
statistic = 0.30075, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Conclusion: Given the small p-value < 0.05, we reject the null hypothesis of spatial randomness. Therefore, there is strong evidence that the variable hunan$GDPPC
exhibits positive spatial autocorrelation in relation to the specified spatial weights. Regions with similar values of hunan$GDPPC
are clustered together, suggesting a spatial pattern in the distribution of this variable.
Visualising Monte Carlo Moran’s I
Plotting the distribution of the statistical values as a histogram
mean(bperm$res[1:999])
[1] -0.01504572
var(bperm$res[1:999])
[1] 0.004371574
summary(bperm$res[1:999])
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.18339 -0.06168 -0.02125 -0.01505 0.02611 0.27593
hist(bperm$res,
freq=TRUE,
breaks=20,
xlab="Simulated Moran's I")
abline(v=0,
col="red")
Conclusion: The statistical observation suggests that the observed spatial autocorrelation in the variable hunan$GDPPC
is unlikely to be due to random chance, as indicated by its position (red line) in the distribution of simulated Moran I values.
Average neighbouring income values for each country
head(hunan)
Simple feature collection with 6 features and 6 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 110.4922 ymin: 28.61762 xmax: 112.3013 ymax: 30.12812
Geodetic CRS: WGS 84
NAME_2 ID_3 NAME_3 ENGTYPE_3 County GDPPC
1 Changde 21098 Anxiang County Anxiang 23667
2 Changde 21100 Hanshou County Hanshou 20981
3 Changde 21101 Jinshi County City Jinshi 34592
4 Changde 21102 Li County Li 24473
5 Changde 21103 Linli County Linli 25554
6 Changde 21104 Shimen County Shimen 27137
geometry
1 POLYGON ((112.0625 29.75523...
2 POLYGON ((112.2288 29.11684...
3 POLYGON ((111.8927 29.6013,...
4 POLYGON ((111.3731 29.94649...
5 POLYGON ((111.6324 29.76288...
6 POLYGON ((110.8825 30.11675...
Global Spatial Autocorrelation: Geary’s
Perform Geary’s c statistics testing by using appropriate functions of spdep package
Geary’s C test
geary.test(hunan$GDPPC, listw=rswm_q)
Geary C test under randomisation
data: hunan$GDPPC
weights: rswm_q
Geary C statistic standard deviate = 3.6108, p-value = 0.0001526
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic Expectation Variance
0.6907223 1.0000000 0.0073364
Conclusion: Given that the p-value is <0.05, we reject the null hypothesis of syatial randomness. The small p-value indicates that the observed Geary C statistic is statistically significant, providing evidence that there is positive spatial autocorrelation in the variable hunan$GDPPC
with respect to the specified spatial weights (rswm_q
). The spatial pattern observed is unlikely to have occurred by random chance alone.
Computing Monte Carlo Geary’s C
Performs permutation test for Geary’s C statistic by using geary.mc() of spdep.
set.seed(1234)
=geary.mc(hunan$GDPPC,
bpermlistw=rswm_q,
nsim=999)
bperm
Monte-Carlo simulation of Geary C
data: hunan$GDPPC
weights: rswm_q
number of simulations + 1: 1000
statistic = 0.69072, observed rank = 1, p-value = 0.001
alternative hypothesis: greater
Conclusion: Given that the p-value is <0.05, we reject the null hypothesis of syatial randomness. The small p-value indicates that the observed Geary C statistic is statistically significant, providing evidence that there is positive spatial autocorrelation in the variable hunan$GDPPC
with respect to the specified spatial weights (rswm_q
). The spatial pattern observed is unlikely to have occurred by random chance alone.
Visualising the Monte Carlo Geary’s C
Plot a histogram to reveal the distribution of the simulated values
mean(bperm$res[1:999])
[1] 1.004402
var(bperm$res[1:999])
[1] 0.007436493
summary(bperm$res[1:999])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7142 0.9502 1.0052 1.0044 1.0595 1.2722
hist(bperm$res, freq=TRUE, breaks=20, xlab="Simulated Geary c")
abline(v=1, col="red")
Conclusion: The statistical observation suggests that the observed spatial autocorrelation in the variable hunan$GDPPC
is unlikely to be due to random chance, as indicated by its position (red line) in the distribution of simulated Moran I values.
Spatial Correlogram
Spatial correlograms show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index of autocorrelation (Moran’s I or Geary’s c) against distance. Useful as an exploratory and descriptive tool.
Compute Moran’s I correlogram
sp.correlogram() of spdep package is used to compute a 6-lag spatial correlogram of GDPPC. The global spatial autocorrelation used in Moran’s I. The plot() of base Graph is then used to plot the output.
<- sp.correlogram(wm_q,
MI_corr $GDPPC,
hunanorder=6,
method="I",
style="W")
plot(MI_corr)
Note: Output plot may not provide complete interpretation, since not all autocorrelation values are statistically significant. Thus, it is important to examine the full analysis report by printing out the analysis results.
print(MI_corr)
Spatial correlogram for hunan$GDPPC
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 (88) 0.3007500 -0.0114943 0.0043484 4.7351 2.189e-06 ***
2 (88) 0.2060084 -0.0114943 0.0020962 4.7505 2.029e-06 ***
3 (88) 0.0668273 -0.0114943 0.0014602 2.0496 0.040400 *
4 (88) 0.0299470 -0.0114943 0.0011717 1.2107 0.226015
5 (88) -0.1530471 -0.0114943 0.0012440 -4.0134 5.984e-05 ***
6 (88) -0.1187070 -0.0114943 0.0016791 -2.6164 0.008886 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The spatial correlogram suggests that there is positive spatial autocorrelation in the variable hunan$GDPPC
across different distance classes, and this autocorrelation is statistically significant. The pattern is not likely to be due to random chance alone.
Compute Geary’s C correlogram and plot
sp.correlogram() of spdep package is used to compute a 6-lag spatial correlogram of GDPPC. The global spatial autocorrelation used in Geary’s C. The plot() of base Graph is then used to plot the output.
<- sp.correlogram(wm_q,
GC_corr $GDPPC,
hunanorder=6,
method="C",
style="W")
plot(GC_corr)
print(GC_corr)
Spatial correlogram for hunan$GDPPC
method: Geary's C
estimate expectation variance standard deviate Pr(I) two sided
1 (88) 0.6907223 1.0000000 0.0073364 -3.6108 0.0003052 ***
2 (88) 0.7630197 1.0000000 0.0049126 -3.3811 0.0007220 ***
3 (88) 0.9397299 1.0000000 0.0049005 -0.8610 0.3892612
4 (88) 1.0098462 1.0000000 0.0039631 0.1564 0.8757128
5 (88) 1.2008204 1.0000000 0.0035568 3.3673 0.0007592 ***
6 (88) 1.0773386 1.0000000 0.0058042 1.0151 0.3100407
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1