What is Data?

Data is information describing some phenomenon.

What is a Phenomenon?

A factor situation that is observed to exist or happen, especially one whose cause or explanation is in question.

A lightning strike

What is a Phenomenon?

A factor situation that is observed to exist or happen, especially one whose cause or explanation is in question.

A lightning strike
A coastline

What is a Phenomenon?

A factor situation that is observed to exist or happen, especially one whose cause or explanation is in question.

A lightning strike
A coastline
A country

What is a Phenomenon?

A factor situation that is observed to exist or happen, especially one whose cause or explanation is in question.

A lightning strike
A coastline
A country
A dog on a kayak!

Anything and everything are phenomena!

Types of Phenomena

Discrete Objects

Distinct boundaries
Chat can be exactly measured
Finite
They are countable and cannot be infinitely subdivided

Continuous Fields

No distinct boundaries
Everywhere has a value
Infinitely divisible
They are not countable and can be infinitely subdivided

Types of Phenomena

When is a phenomenon discrete or continuous?

To an extent, it depends on our perspective and the scale of our analysis.
Many phenomenon are a bit of both.

Lightning

A strike is a discrete object, what about a lighting bolt?

Sort of continuous?

Lightning

A strike is a discrete object, what about a lighting bolt?

Strike frequency is a continuous field
- Everywhere has a value
- Even the absence of strikes, is a frequency of strikes

A Coastline

Continuous field at large scale

Tides & waves
Where is the exact boundary?

A Coastline

Discrete object at small scale

Zoom out and the tides/waves don’t really matter
Its easy to draw a static boundary

A Coastline

Unless you change the time scale

Types of Phenomena

Most things don’t fall perfectly into one category or the other.

That said, it is a helpful framework as long as we recognize the discrete vs. continuous dichotomy is not a perfect classification

TopHat Question 1

Discrete objects: (select all that apply)

Are countable
Do not have distinct boundaries
Are infinitely divisible
Have well defined boundaries

Discrete Objects

Buildings are a great example.

Concrete boundaries
Countable
Real physical object

Discrete Objects

Political Boundaries are also a great example.

Distinct boundaries
Countable
Not a physical object

Continuous Fields

Elevation is a great example.

Everywhere on Earth
No “number of elevations”
A physical property

Continuous Fields

Density of tweets is also a great example.

Everywhere has this too
Derived from something countable
But not countable itself
Not a physical property

Working Together

Frequently we’ll end up working with both discrete objects and continuous fields.

In Module 1, you worked with:
- Cholera deaths
  - Discrete objects
- Kernel density
  - Continuous field

Digital information

We’ll talk more about spatial data models later. For now, lets think about data more broadly.

How do we represent data in a computer?

Digital information

Digital information is represented as bits (0’s and 1’s)

We typically quantify data as bytes (8 bits):

Kilobyte (kB) = 1,000 bytes
Megabyte (MB) = 1,000,000 bytes
Gigabyte (GB) = 1,000,000,000 bytes

Digital information

There are numerous ways to translate human readable data to binary, such as ASCII.

Each character is represented as one byte
2⁸ = 256 unique combinations of 0’s and 1’s in a byte
Some examples:
- “A” : 01000001
- “CAT”: 01000011 01000001 01010100
- “31”: 00110011 00110001

Digital Information

Modern computers use 64-bit “architecture”. The central processing unit (CPU) can handle 64 bits (8 bytes) of information at a time.

“Word” length is 64 bits, a word is a unit of data
- i.e., an individual piece of information
- 18 quintillion unique combinations of 1’s and 0’s
CPUs can be stacked in parallel to handle more information at one time

Representing Phenomena in GIS

Within the context of a GIS, every piece of information describing a phenomenon is referred to as an Attribute.

Broadly speaking each attribute can address one of three questions:
- Where?
- What?
- When?

Types of Attributes

There are multiple ways to classify/think about attributes. One important distinction we must make

Non-Spatial Attributes: describe what or when
Spatial Attributes: describe where
- Puts the Geographic in GIS
- Requires some special considerations
- We have already talked a bit about map projections
- We’ll discuss more considerations in the next module

Types of Attributes

All data (attributes), spatial and non-spatial, can be either qualitative or quantitative.

Analysis we can do with qualitative data are more limited
- Does not make quantitative data “better”
Measurement scales: both qualitative and quantitative can be measured on different scales
- Qualitative: Nominal or Ordinal Sales
- Quantitative: Interval or Ratio Sales

Qualitative Data

Qualitative data is Categorical. It is strictly descriptive and lacks any meaningful numeric value.

Textual, coded numerals, pictures, sounds, etc
- Typically working with textual & coded numerals most frequently in GIS
Limited number of computational options, often requires careful consideration when analyzing
Measured on either a Nominal or Ordinal scale.

Nominal Scale

Names or categories with no ranking or direction. Categories are not more/less, better/worse, they just different. Some examples include:

Flower Species

Nominal Scale

Names or categories with no ranking or direction. Categories are not more/less, better/worse, they just different. Some examples include:

Flower Species
Zoning Categories

Nominal Scale

Names or categories with no ranking or direction. Categories are not more/less, better/worse, they just different. Some examples include:

Flower Species
Zoning Categories
Land cover Classification

Nominal Operations

With nominal data we can:

Check equivalency
Count frequencies
Nothing else

Ordinal Scale

Names or categories with a ranking. The differences are relative. Categories are more/less, better/worse, etc.

Spice levels

Ordinal Scale

Names or categories with a ranking. The differences are relative. Categories are more/less, better/worse, etc.

Spice levels
Relative heights

Ordinal Scale

Names or categories with a ranking. The differences are relative. Categories are more/less, better/worse, etc.

Spice levels
Relative heights
Compass Direction

Ordinal Operations

All the same operations as nominal data + more. With ordinal data we can:

Check equivalency
Count frequencies
Check order/rank

Ordinal Operations

Sometimes we can calculate the median.

Odd sets the median is the middle.
Even sets, average of the middle two.
One solution, arbitrarily assign a numeric score.

Graded Membership

Exceptions that blur the lines. Where to draw the line between forest/alpine?

Grade membership to assign categories
Winner take all: alpine meadow
- 45% alpine meadow
- 40% forest
- 5% bare rock

Graded Membership

In practice, lots of qualitative data we work with, especially for natural phenomena, are actually graded membership.

The downside: variability within the area is lost.

TopHat Question 2

Which of the following would be examples of Nominal Data? (select all that apply)

Air temperature
Ice cream flavors
Tree height
Colors
Drink sizes

Quantitative Data

Quantitative data is Numeric. It describe the quantities associated with a phenomenon. Key properties include:

Values separated by a meaningful unit.
More arithmetic operations possible.
Can be Discrete or Continuous numbers.
Measured on either a Ratio or Interval scale.

Kinds of Numbers

Discrete

Whole numbers
Counts
Not infinitely divisible
Names in ArcGIS Pro:
Integer, Long, etc.

Continuous

Decimals
Measurements
Infinitely divisible
Names in ArcGIS Pro:
Float, Double, etc.

Kinds of Numbers

Discrete

Countable
Examples:
Population
Year
“Age”

Continuous

Non-countable
Examples:
Temperature
Height
Speed

Quantitative Data

Both Interval and Ratio data can consist of discrete or continuous numbers. These types of quantitative data are closely related, but have one important distinction.

Interval scales have an arbitrary zero point
- Can be negative
- Cannot multiply/divide
  - To compare magnitudes

Ratio scales have a fixed, absolute zero point
- Can multiply/divide
- Cannot be negative
  - Can increase infinitely from zero

Celsius (interval) vs. Kelvin (ratio)

°C = K-273.15.

0 °C: Freezing point of water
- Drops below 0 °C all the time
0 K: “Absolute Zero”
- Physically cannot get any colder

Celsius (interval) vs. Kelvin (ratio)

°C = K-273.15.

100 °C is not ∞% warmer than as 0 °C
- It’s actually ~ 36% warmer
- (373.15 K - 273.15 K) ⁄ 273.15 K ~ 0.36

Interval Scale

Interval data has an arbitrary zero point.

Calendar years
- Discrete interval data
Temperature (in celsius)
- Continuous interval data
Other examples:
- ph scale (continuous)
- Times (discrete-ish)

Interval Scale

Interval data has an arbitrary zero point.

Calendar years
- Discrete interval data
Temperature (in celsius)
- Continuous interval data
Other examples:
- ph scale (continuous)
- Times (discrete-ish)

Ratio Scale

Ratio data has a fixed, absolute zero point.

Population
- Discrete ratio data
Tree height
- Continuous ratio data
Other examples:
- Precipitation (Continuous)
- Vote Totals (Discrete)

Ratio Scale

Ratio data has a fixed, absolute zero point.

Population
- Discrete ratio data
Tree height
- Continuous ratio data
Other examples:
- Precipitation (Continuous)
- Vote Totals (Discrete)

TopHat Question 3

Match the value to the type measurement scale and type of number:

Length a hiking trail	Interval (Discrete)
Temperature in Fahrenheit	Ratio (Discrete)
Global Orca Population	Ratio (Continuous)
Change in Global Orca Population from 2000 to 2022	Interval (Continuous)

Derived Ratio

Sometimes called normalizing or standardizing, we calculate derived ratios to account for the influence of a confounding variable over a variable of interest. e.x. Housing affordability (Ha):

You need to account for income (I) to figure out how affordable rent (R) is: $H\_a = \frac{R}{I}$
- Ha: 31.5% of my income goes to rent
- Income and rent ($) are both discrete, housing affordability (%) is continuous.

Derived Ratio

In Lab, you are going to work with two derived ratios:

Income and Food expenditures are correlated
- Need to account for income if you analyze other factors

Derived Ratio

In Lab, you are going to work with two derived ratios:

Population and Area are not highly correlated
- But area definitely influences population
- Need to account for area to analyze other factors

TopHat Question 4

Speed is another example of a derived ratio. If a line of thunderstorm takes 5 hours to travel from Brandon, MB to Winnipeg, MB (200 km), what is the storm’s speed in km/hr?

Summary: Types of Data

Summary: Operations

Operation	Nominal	Ordinal	Interval	Ratio
Equality	x	x	x	x
Counts/Mode	x	x	x	x
Rank/Order		x	x	x
Median		~	x	x
Add/Subtract			x	x
Mean			x	x
Multiply/Divide				x