5. Development of statistical models
Multiple regression was used to examine relationships between median E.coli and environmental factors, and to derive a statistical tool with which to predict median concentrations across the region. Median E.coli concentration was used as the dependent variable, and the environmental factors in Table 2, the independent variables. The strength of relationships was assessed using the coefficient of determination (R2), expressed as a percentage and adjusted for degrees of freedom. All environmental factors were examined in an interactive stepwise selection procedure, using DataDesk software, regardless of the strength of their bivariate relationship (section 4.2) with median E.coli. It is important to note that during this process independent variables were retained even if they were correlated with other independent variables in the model. The analysis derived a predictive model whereby four factors together explained 69% of the variance in median E.coli across the region. Each factor, and the partial R2 associated with its addition to the regression model, is given in Table 4.
Table 4. The statistical model derived from all sites across the region.
|
Variable |
Coefficient |
Partial R2 |
Comments |
|
Constant |
99.2 |
Intercept |
|
|
%Poordrain |
5.5 |
47.3 |
% of poorly drained soil |
|
TurbMedian |
9.4 |
49.2 |
Median turbidity |
|
Cattle |
0.14 |
52.6 |
Cattle stock units |
|
NonDairyPtSource |
15.6 |
69.2 |
Volume of non-dairy point sources |
The factors were: the percentage of land with poorly drained soil, median turbidity, cattle density (stock units/km2), and the volume of non-dairy point source discharge (m3/day/km2), providing the following relationship (Equation 1).
...Equation 1
The addition of non-dairy point sources to the statistical model increased the variance explained from 55% to 69%, but this was primarily attributed to the median value at just one site (site 64, see section 4.2.8). Since non-dairy point sources are not a strong predictor of median E. coli across the region, a second regression model was developed, which excluded site 64 from the analysis, whereby four factors explained 68% of variance (Equation 2 and Table 5), with the first 3 factors being common to both models.
Equation 2
Co-linearity is apparent between the independent variables within the models (Table 3), notably between the percentage of poorly drained soil and median turbidity (R=0.79). Co-linearity means that equations 1 and 2 should not be used to draw inferences about the relative contributions to median E. coli concentrations made by each of the independent variables. The equations will be most reliable for predicting E. coli concentrations in unmonitored catchments where the relationships between independent variables are similar to those in the original dataset. This is likely to hold in most places throughout the Waikato Region but may not apply elsewhere in the country. Both models are characterised by fairly high intercepts (99, 196) reflecting faecal contamination in the absence of grazing livestock (see section 4.2.2). Implications drawn from the statistical models are discussed in section 6.
Table 5. The statistical model derived excluding site 64 (with an unusually high point discharge).
|
Variable |
Coefficient |
Partial R2 |
Comments |
|
Constant |
195.6 |
Intercept |
|
|
%Poordrain |
4.1 |
59.9 |
% of poorly drained soil |
|
TurbMedian |
8.4 |
62.9 |
Median turbidity |
|
Cattle |
0.17 |
65.9 |
Cattle stock units |
|
%Welldrain |
-1.4 |
67.5 |
% of well drained soil |
Contact for Enquiries
MAF Information Services
Pastoral House
25 The Terrace
PO Box 2526
Wellington, NEW ZEALAND
Fax: +64 4 894 0721
Contact this person

