This document discusses displaying and querying data. It describes categorical versus continuous data and different classification schemes for mapping data, including natural breaks, equal interval, and quantile methods. Querying allows asking questions of the data by examining attributes or selecting features based on geometry or attributes. The goals are to understand how to best present data based on the data type, location, and audience as well as subset data to highlight specific information.
2. Objectives
Be able to describe categorical versus continuous
data
Be able to select an appropriate classification
scheme and number of classes for your data
Understand why querying adds value to your data
Be able to describe selection
3. Displaying Data
Three ideas to keep in mind:
The data WHAT?
The area WHERE?
The audience WHO?
How the data is presented directly affects how
the data is interpreted.
A map that is simple to interpret can be the hardest to make!
4. Mapping Continuous Versus Categorical Data
Continuous DataCategorical Data
Land Types in Africa Subnational HIV prevalence in Africa
12. What is Selection?
Choosing one or more features by their geometry or
attribute information
Within same layer or between several layers
Example:
All districts with a population of greater than 50,000 people
All districts within the Western Region
12
13. Asking questions of your data
Basic examination of your data
Can look at one attribute at a time or multiple attributes together
Understand your data before deciding how best to
display it
Subset your data to highlight specific information
Querying
13
14. Key Points
The 3 Ws of map making
The data WHAT?
The area WHERE?
The audience WHO?
The 3 most common methods to classify data
Natural breaks (Jenks)
Equal interval
Quantile
Querying allows you to ask questions of your data
Querying allows you to make sub-sets of your data to
refine your map
#3: This session will begin with a presentation on displaying data, including discussing categorical versus continuous data and an overview three major classification schemes. I will then move on to discuss a bit about querying and selecting data and how it helps us make more effective maps.
Following this presentation, there will be a demonstration on how to display and query data in QGIS.
#4: Generally, the factors to consider when choosing how to display your data include the distribution of the data and where in the world the map will focus on, the purpose the map will be used for, and the audience who will be using the map.
#5: The type of data that you have determines the type of mapping that can be used.
How you classify patterns represents a form of analysis that can either illuminate or hide important spatial patterns.
The most basic distinction we make in mapping is between continuous (quantitative) versus categorical data.
Continuous data is represented by measurements on a continuous scale. For example, the weight, height, or age of respondents in a survey would represent continuous variables. Here we see the percent of HIV positive people aggregated to the subnational level for selected countries in Africa. This is a continuous data scheme, whereby the percent of HIV infected individuals is represented on a continuous scale from < 1% to just under 30%.
On the other hand, categorical data represent information about values that can be grouped. For example, a person's gender, occupation, or marital status are categorical types of data. The map of major land types in Afica here is a good example of mapping categorical data as a way to compare the relative proportions among our five land type categories.
Some variables could be considered in either way. For example, a person's age is usually considered a continuous variable; however, we may consider it a categorical variable if we group it into 5 categories: child, teenager, young adult, middle age, and senior.
#6: Mapping Categorical Data
Here we see a map that shades Kenyas regions according to region name. This is categorical data because the data is easily grouped into clear categories.
#7: Unlike the previous map, where you saw patterns for discrete categories of data, here we see an example of mapping continuous data in the form of population density in Africa.
To make them easier to handle, continuous variables are usually grouped into "class intervals. This map showing population density in Africa, for example, is broken into 6 classes.
These ranges provide a generalized view of the underlying variable where similar values are collapsed into a small number of categories. When you work with classified data you exchange the detail of the original values for a "big picture" view of the data.
#8: At its heart, classification is an exercise in categorization. We assign locations to categories in order to reduce the complexity of the real world, thereby creating an abstraction that helps us
better understand particular characteristics of the world without the distraction of all of the other possible characteristics that we could examine.
I will discuss 3 main classification techniques today: natural breaks, equal interval, and quantile.
Each of these schemes are more or less useful for mapping data with particular types of statistical distributions. For example, the equal interval scheme seems to work best for data with a rectangular distribution (i.e., approximately equal numbers of observations over the data range), while equal interval is not very effective for highly skewed data as there may be many empty classes, forcing most observations into one or two classes, and leaving a very uninteresting map.
#9: At its heart, classification is an exercise in categorization. We assign locations to categories in order to reduce the complexity of the real world, thereby creating an abstraction that helps us
better understand particular characteristics of the world without the distraction of all of the other possible characteristics that we could examine.
I will discuss 3 main classification techniques today: natural breaks, quantile, and equal interval.
Heres a map showing the percent of children receiving vaccinations in Tanzania using the natural breaks classification scheme.
Classification by natural breaks uses a calculation that creates class breaks inherent within the data by maximizing the differences between classes.
#10: The equal interval classification method divides the range of the data into classes with equal-sized ranges. This is done by figuring out the range of the data and dividing that range by the number of classes desired.
For example, in the data set presented here showing the percent of children receiving vaccinations in Tanzania, the data set values range from 42 to 95% and are divided into four classes with an equal number of equal-sized intervals of about 12% between the four classes.
#11: The quantile method divides the data set into an equal numbers of observations per class. For example, in a dataset with 20 observations and 4 classes, each class would contain 25% of the observation.
By dictating a certain number of sample points per class, quantile classification schemes can sometimes create classes that include broad data values as shown in the map above with the first class spanning 30% while the last class spans only 5%
Data values and classes aside, this method produces maps that have an apparent balance -that is to say that each class is represented equally.
#12: After you decided on your classification theme, you need to evaluate whether your map readers will be able to physically see differences in the symbol set you will use. For example, if you are creating a choropleth (graduated color) map, most map readers will only be able to distinguish six or seven different value levels, so your map should not exceed six or seven classes.