Uncertainty = Accuracy + Precision + Ambiguity + Vagueness + Logical Fallacies
Arises from our inability to measure phenomena perfectly and flaws in our conceptual models.
There is no standardized measure of data quality in GIS.
Data must be assessed on a case by case basis.
The terms are related, but the distinction is very important.
Accuracy: The degree to which a set of measurements correctly matches the real world values.
Precision: The degree of agreement between multiple measurements of the same real world phenomena.
Accuracy is biased, it is related to systematic errors in a measurement. Precision is unbiased, it is related to ___ errors in a measurement
Statistical methods can be used to quantify error.
Mean Absolute Error (MAE):
$MAE = \frac{\sum_{i=1}^N \lvert{x_i-t_i}\rvert}{N}$
Mean Squared Error (MSE):
$MSE = \frac{\sum_{i=1}^N \left({x_i-t_i}\right)^2}{N}$
Root Mean Squared Error (RMSE):
$RMSE = \sqrt{\frac{\sum_{i=1}^N \left({x_i-t_i}\right)^2}{N}}$
Standard Deviation ($\sigma$):
$\sigma=\sqrt{\frac{\sum_{i=1}^N \left({x_i-\overline{X}}\right)^2}{N}}$
Confidence Intervals (CI):
Used to convey our confidence in an estimated average value.
$CI = \frac{\sigma}{\sqrt{N}} z$
Confidence Intervals (CI):
Inter Quartile Range (IQR):
Which of the following metrics can be used to describe the accuracy of an estimate?
The terms are also related. The distinction is subtle and can be confusing. They aren't synonymous, but you can essentially treat them as such. Often when something is vague, it is also ambiguous, and vice versa.
Vagueness: When a definition is not clearly stated or defined.
Ambiguity: When something can reasonably be interpreted in multiple ways.
These statements are vague because they lack detail. They are ambiguous because they have multiple interpretations.
The position of objects are unclear or changeable.
Ambiguity and vagueness are difficult to quantify numerically. But they still must be addressed whenever possible.
The key with these issues:
Where does uncertainty come from and what can we do to minimize it?
Uncertainty = Accuracy + Precision + Ambiguity + Vagueness + Logical Fallacies
Some sources of error are out of our control. The instruments we use to collect data can only so precise.
The concentration of samples in space and time dictates the level of accuracy & precision you can attain.
Things we do have some control over:
Errors that arise when creating vector features:
Errors that arise when creating vector features:
Digitization errors arise when we manually create features.
Since geographic phenomena often don’t have clear, natural units, we are often forced to assign zones and labels in our work (i.e. Census Data).
Much of the data we use to learn about society is collected in aggregate. We take average values for many individuals within a group or area (i.e. Census Data).
Even with "perfect" data; GIS operations can add uncertainty:
A flaw in our reasoning that undermines the logic of our argument.
Applying data collected/presented in aggregate for a group/region and applying it to an individual or specific place.
When we take aggregated data and aggregate it again at a higher level. You can't take the average of averages.
Imagine a population of turtles living on logs in a pond. You work for the turtle census and visit each log and ask the turtles their age.
Modifiable, arbitrary boundaries can have a significant impact on descriptive statistics for areas.
Data collected at a finer level of detail is being combined into larger areas of lower detail that can be manipulated.
Gerrymandering exploits the atomistic fallacy to skew election results.
Errors are cumulative: