Logistic and Auto-logistic regression

Performs a logistic (binomial) or auto-logistic (spatially lagged binomial) regression using maximum likelihood or penalized maximum likelihood estimation.

It should be noted that the auto-logistic model (Besag 1972) is intended for exploratory analysis of spatial effects. Auto-logistic are know to underestimate the effect of environmental variables and tend to be unreliable (Dormann 2007). Wij matrix options under style argument - B is the basic binary coding, W is row standardized (sums over all links to n), C is globally standardized (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity) and S is variance-stabilizing. Spatially lagged y defined as: W(y)ij=sumj_(Wij yj)/ sumj_(Wij) where; Wij=1/Euclidean(i,j) If the object passed to the function is an sp class there is no need to call the data slot directly via "object@data", just pass the object name.

logistic.regression(
  ldata,
  y,
  x,
  penalty = TRUE,
  autologistic = FALSE,
  coords = NULL,
  bw = NULL,
  type = "inverse",
  style = "W",
  longlat = FALSE,
  ...
)

Arguments

ldata	data.frame object containing variables
y	Dependent variable (y) in ldata
x	Independent variable(s) (x) in ldata
penalty	Apply regression penalty (TRUE/FALSE)
autologistic	Add auto-logistic term (TRUE/FALSE)
coords	Geographic coordinates for auto-logistic model matrix or sp object.
bw	Distance bandwidth to calculate spatial lags (if empty neighbors result, need to increase bandwidth). If not provided it will be calculated automatically based on the minimum distance that includes at least one neighbor.
type	Neighbor weighting scheme (see autocov_dist)
style	Type of neighbor matrix (Wij), default is mean of neighbors
longlat	Are coordinates (coords) in geographic, lat/long (TRUE/FALSE)
...	Additional arguments passed to lrm

Value

A list class object with the following components:

model - lrm model object (rms class)
bandwidth - If AutoCov = TRUE returns the distance bandwidth used for the auto-covariance function
diagTable - data.frame of regression diagnostics
coefTable - data.frame of regression coefficients
Residuals - data.frame of residuals and standardized residuals
AutoCov - If an auto-logistic model, AutoCov represents lagged auto-covariance term

References

Besag, J.E., (1972) Nearest-neighbour systems and the auto-logistic model for binary data. Journal of the Royal Statistical Society, Series B Methodological 34:75-83

Dormann, C.F., (2007) Assessing the validity of autologistic regression. Ecological Modelling 207:234-242

Le Cessie, S., Van Houwelingen, J.C., (1992) Ridge estimators in logistic regression. Applied Statistics 41:191-201

Shao, J., (1993) Linear model selection by cross-validation. JASA 88:486-494

Author

Jeffrey S. Evans jeffrey_evans@tnc.org

Examples

require(sp)
require(spdep)
require(rms)                                                                       
#> Loading required package: rms
#> Loading required package: Hmisc
#> Loading required package: lattice
#> Loading required package: survival
#> 
#> Attaching package: 'survival'
#> The following object is masked from 'package:spatialEco':
#> 
#>     concordance
#> Loading required package: Formula
#> Loading required package: ggplot2
#> 
#> Attaching package: 'Hmisc'
#> The following object is masked from 'package:rgeos':
#> 
#>     translate
#> The following objects are masked from 'package:raster':
#> 
#>     mask, zoom
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
#> Loading required package: SparseM
#> 
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#> 
#>     backsolve
#> Warning: undefined subclass "numericVector" of class "Mnumeric"; definition not updated
data(meuse)
  coordinates(meuse) <- ~x+y  
    meuse@data <- data.frame(DepVar=rbinom(dim(meuse)[1], 1, 0.5), 
                              meuse@data)

#### Logistic model
lmodel <- logistic.regression(meuse, y='DepVar', 
                  x=c('dist','cadmium','copper')) 
  lmodel$model
#> Logistic Regression Model
#>  
#>  rms::lrm(formula = form, data = ldata, x = TRUE, y = TRUE, penalty = bf$penalty)
#>  
#>  
#>  Penalty factors
#>  
#>   simple nonlinear interaction nonlinear.interaction
#>        1         1           1                     1
#>  
#>                       Model Likelihood    Discrimination    Rank Discrim.    
#>                             Ratio Test           Indexes          Indexes    
#>  Obs           155    LR chi2     7.39    R2       0.061    C       0.616    
#>   0             88    d.f.       2.623    g        0.483    Dxy     0.232    
#>   1             67    Pr(> chi2)0.0447    gr       1.621    gamma   0.232    
#>  max |deriv| 2e-09    Penalty     0.24    gp       0.110    tau-a   0.115    
#>                                           Brier    0.235                     
#>  
#>            Coef    S.E.   Wald Z Pr(>|Z|) Penalty Scale
#>  Intercept  0.7961 0.5882  1.35  0.1759    0.0000      
#>  dist      -1.0084 1.0439 -0.97  0.3340    0.1977      
#>  cadmium   -0.0440 0.1173 -0.38  0.7076    3.5237      
#>  copper    -0.0175 0.0171 -1.03  0.3037   23.6804      
#>  
    lmodel$diagTable
#>         Names        Value
#> 1         Obs 1.550000e+02
#> 2   Max Deriv 2.107191e-09
#> 3  Model L.R. 7.393192e+00
#> 4        d.f. 2.623476e+00
#> 5           P 4.466970e-02
#> 6           C 6.159261e-01
#> 7         Dxy 2.318521e-01
#> 8       Gamma 2.320489e-01
#> 9       Tau-a 1.145371e-01
#> 10         R2 6.054035e-02
#> 11      Brier 2.346796e-01
#> 12          g 4.832799e-01
#> 13         gr 1.621384e+00
#> 14         gp 1.098474e-01
#> 15        PEN 1.000000e+00
#> 16        AIC 2.146239e+00
      lmodel$coefTable
#>    Variable        Coef   StdError       Wald      Prob
#> 1 Intercept  0.79612771 0.58816259  1.3535844 0.1758690
#> 2      dist -1.00843098 1.04392027 -0.9660038 0.3340423
#> 3   cadmium -0.04401098 0.11731756 -0.3751440 0.7075534
#> 4    copper -0.01753671 0.01705085 -1.0284945 0.3037173

#### Logistic model with factorial variable
lmodel <- logistic.regression(meuse, y='DepVar', 
            x=c('dist','cadmium','copper', 'soil')) 
  lmodel$model
#> Logistic Regression Model
#>  
#>  rms::lrm(formula = form, data = ldata, x = TRUE, y = TRUE, penalty = bf$penalty)
#>  
#>  
#>  Penalty factors
#>  
#>   simple nonlinear interaction nonlinear.interaction
#>        1         1           1                     1
#>  
#>                       Model Likelihood    Discrimination    Rank Discrim.    
#>                             Ratio Test           Indexes          Indexes    
#>  Obs           155    LR chi2     7.65    R2       0.062    C       0.612    
#>   0             88    d.f.       4.278    g        0.495    Dxy     0.224    
#>   1             67    Pr(> chi2)0.1231    gr       1.640    gamma   0.225    
#>  max |deriv| 2e-09    Penalty     0.27    gp       0.113    tau-a   0.111    
#>                                           Brier    0.234                     
#>  
#>            Coef    S.E.   Wald Z Pr(>|Z|) Penalty Scale
#>  Intercept  0.8395 0.5969  1.41  0.1596    0.0000      
#>  dist      -1.0111 1.2272 -0.82  0.4100    0.1977      
#>  cadmium   -0.0443 0.1176 -0.38  0.7064    3.5237      
#>  copper    -0.0179 0.0171 -1.04  0.2965   23.6804      
#>  soil=2    -0.1259 0.4050 -0.31  0.7559    0.8165      
#>  soil=3     0.1227 0.6643  0.18  0.8535    0.8165      
#>  
    lmodel$diagTable
#>         Names         Value
#> 1         Obs  1.550000e+02
#> 2   Max Deriv  1.865718e-09
#> 3  Model L.R.  7.652394e+00
#> 4        d.f.  4.278462e+00
#> 5           P  1.230936e-01
#> 6           C  6.121947e-01
#> 7         Dxy  2.243894e-01
#> 8       Gamma  2.245036e-01
#> 9       Tau-a  1.108504e-01
#> 10         R2  6.236510e-02
#> 11      Brier  2.342899e-01
#> 12          g  4.945176e-01
#> 13         gr  1.639707e+00
#> 14         gp  1.126479e-01
#> 15        PEN  1.000000e+00
#> 16        AIC -9.045299e-01
      lmodel$coefTable
#>    Variable        Coef   StdError       Wald      Prob
#> 1 Intercept  0.83948551 0.59687893  1.4064586 0.1595880
#> 2      dist -1.01106194 1.22716994 -0.8238973 0.4099979
#> 3   cadmium -0.04428282 0.11755836 -0.3766880 0.7064055
#> 4    copper -0.01788235 0.01712859 -1.0440063 0.2964825
#> 5    soil=2 -0.12589581 0.40497832 -0.3108705 0.7558991
#> 6    soil=3  0.12268901 0.66429948  0.1846893 0.8534727

### Auto-logistic model using 'autocov_dist' in 'spdep' package
lmodel <- logistic.regression(meuse, y='DepVar', 
            x=c('dist','cadmium','copper'), autologistic=TRUE, 
            coords=coordinates(meuse), bw=5000) 
  lmodel$model
#> Logistic Regression Model
#>  
#>  rms::lrm(formula = form, data = ldata, x = TRUE, y = TRUE, penalty = bf$penalty)
#>  
#>  
#>  Penalty factors
#>  
#>   simple nonlinear interaction nonlinear.interaction
#>        1         1           1                     1
#>  
#>                       Model Likelihood    Discrimination    Rank Discrim.    
#>                             Ratio Test           Indexes          Indexes    
#>  Obs           155    LR chi2    10.49    R2       0.085    C       0.628    
#>   0             88    d.f.       3.577    g        0.602    Dxy     0.256    
#>   1             67    Pr(> chi2)0.0240    gr       1.825    gamma   0.256    
#>  max |deriv| 1e-08    Penalty     0.41    gp       0.136    tau-a   0.126    
#>                                           Brier    0.230                     
#>  
#>            Coef    S.E.   Wald Z Pr(>|Z|) Penalty Scale
#>  Intercept  3.7502 1.8505  2.03  0.0427    0.0000      
#>  dist      -1.0663 1.0486 -1.02  0.3092    0.1977      
#>  cadmium   -0.0286 0.1195 -0.24  0.8112    3.5237      
#>  copper    -0.0220 0.0176 -1.25  0.2122   23.6804      
#>  AutoCov   -6.5606 3.8810 -1.69  0.0909    0.0434      
#>  
    lmodel$diagTable
#>         Names        Value
#> 1         Obs 1.550000e+02
#> 2   Max Deriv 1.480466e-08
#> 3  Model L.R. 1.049036e+01
#> 4        d.f. 3.577436e+00
#> 5           P 2.403887e-02
#> 6           C 6.279681e-01
#> 7         Dxy 2.559362e-01
#> 8       Gamma 2.560665e-01
#> 9       Tau-a 1.264349e-01
#> 10         R2 8.450814e-02
#> 11      Brier 2.303498e-01
#> 12          g 6.015511e-01
#> 13         gr 1.824947e+00
#> 14         gp 1.356461e-01
#> 15        PEN 1.000000e+00
#> 16        AIC 3.335485e+00
      lmodel$coefTable
#>    Variable        Coef   StdError       Wald       Prob
#> 1 Intercept  3.75019690 1.85052025  2.0265636 0.04270707
#> 2      dist -1.06628332 1.04859510 -1.0168685 0.30921599
#> 3   cadmium -0.02855437 0.11952431 -0.2389001 0.81118303
#> 4    copper -0.02196396 0.01760738 -1.2474288 0.21224033
#> 5   AutoCov -6.56060992 3.88103656 -1.6904272 0.09094625
  est <- predict(lmodel$model, type='fitted.ind')

#### Add residuals, standardized residuals and estimated probabilities
VarNames <- rownames(lmodel$model$var)[-1]
  meuse@data$AutoCov <- lmodel$AutoCov
    meuse@data <- data.frame(meuse@data, Residual=lmodel$Residuals[,1], 
                     StdResid=lmodel$Residuals[,2], Probs=predict(lmodel$model, 
                     meuse@data[,VarNames],type='fitted') )  

#### Plot fit and probabilities
resid(lmodel$model, "partial", pl="loess") 
# plot residuals
resid(lmodel$model, "partial", pl=TRUE) 

# global test of goodness of fit 
resid(lmodel$model, "gof")
#> Sum of squared errors     Expected value|H0                    SD 
#>            35.7042169            35.7891682             0.1244555 
#>                     Z                     P 
#>            -0.6825841             0.4948697 

# Approx. leave-out linear predictors
lp1 <- resid(lmodel$model, "lp1")            

# Approx leave-out-1 deviance            
-2 * sum(meuse@data$DepVar * lp1 + log(1-plogis(lp1)))
#> [1] 211.513

# plot estimated probabilities at points
spplot(meuse, c('Probs'))

Arguments

Value

References

See also

Author

Examples