Fundamentals of Data Visualization (2024)

Whenever we visualize data, we take data values and convert them in a systematic and logical way into the visual elements that make up the final graphic. Even though there are many different types of data visualizations, and on first glance a scatter plot, a pie chart, and a heatmap don’t seem to have much in common, all these visualizations can be described with a common language that captures how data values are turned into blobs of ink on paper or colored pixels on screen. The key insight is the following: All data visualizations map data values into quantifiable features of the resulting graphic. We refer to these features as aesthetics.

2.1 Aesthetics and types of data

Aesthetics describe every aspect of a given graphical element. A few examples are provided in Figure 2.1. A critical component of every graphical element is of course its position, which describes where the element is located. In standard 2d graphics, we describe positions by an x and y value, but other coordinate systems and one- or three-dimensional visualizations are possible. Next, all graphical elements have a shape, a size, and a color. Even if we are preparing a black-and-white drawing, graphical elements need to have a color to be visible, for example black if the background is white or white if the background is black. Finally, to the extent we are using lines to visualize data, these lines may have different widths or dash–dot patterns. Beyond the examples shown in Figure 2.1, there are many other aesthetics we may encounter in a data visualization. For example, if we want to display text, we may have to specify font family, font face, and font size, and if graphical objects overlap, we may have to specify whether they are partially transparent.

Figure 2.1: Commonly used aesthetics in data visualization: position, shape, size, color, line width, line type. Some of these aesthetics can represent both continuous and discrete data (position, size, line width, color) while others can usually only represent discrete data (shape, line type).

All aesthetics fall into one of two groups: Those that can represent continuous data and those that can not. Continuous data values are values for which arbitrarily fine intermediates exist. For example, time duration is a continuous value. Between any two durations, say 50 seconds and 51 seconds, there are arbitrarily many intermediates, such as 50.5 seconds, 50.51 seconds, 50.50001 seconds, and so on. By contrast, number of persons in a room is a discrete value. A room can hold 5 persons or 6, but not 5.5. For the examples in Figure 2.1, position, size, color, and line width can represent continuous data, but shape and line type can usually only represent discrete data.

Next we’ll consider the types of data we may want to represent in our visualization. You may think of data as numbers, but numerical values are only two out of several types of data we may encounter. In addition to continuous and discrete numerical values, data can come in the form of discrete categories, in the form of dates or times, and as text (Table 2.1). When data is numerical we also call it quantitative and when it is categorical we call it qualitative. Variables holding qualitative data are factors, and the different categories are called levels. The levels of a factor are most commonly without order (as in the example of “dog”, “cat”, “fish” in Table 2.1), but factors can also be ordered, when there is an intrinsic order among the levels of the factor (as in the example of “good”, “fair”, “poor” in Table 2.1).

Table 2.1: Types of variables encountered in typical data visualization scenarios.
Type of variable	Examples	Appropriate scale	Description
quantitative/numerical continuous	1.3, 5.7, 83, 1.5x10^-2	continuous	Arbitrary numerical values. These can be integers, rational numbers, or real numbers.
quantitative/numerical discrete	1, 2, 3, 4	discrete	Numbers in discrete units. These are most commonly but not necessarily integers. For example, the numbers 0.5, 1.0, 1.5 could also be treated as discrete if intermediate values cannot exist in the given dataset.
qualitative/categorical unordered	dog, cat, fish	discrete	Categories without order. These are discrete and unique categories that have no inherent order. These variables are also called factors.
qualitative/categorical ordered	good, fair, poor	discrete	Categories with order. These are discrete and unique categories with an order. For example, “fair” always lies between “good” and “poor”. These variables are also called ordered factors.
date or time	Jan. 5 2018, 8:03am	continuous or discrete	Specific days and/or times. Also generic dates, such as July 4 or Dec. 25 (without year).
text	The quick brown fox jumps over the lazy dog.	none, or discrete	Free-form text. Can be treated as categorical if needed.

To examine a concrete example of these various types of data, take a look at Table 2.2. It shows the first few rows of a dataset providing the daily temperature normals (average daily temperatures over a 30-year window) for four U.S. locations. This table contains five variables: month, day, location, station ID, and temperature (in degrees Fahrenheit). Month is an ordered factor, day is a discrete numerical value, location is an unordered factor, station ID is similarly an unordered factor, and temperature is a continuous numerical value.

Table 2.2: First 12 rows of a dataset listing daily temperature normals for four weather stations. Data source: NOAA.
Month	Day	Location	Station ID	Temperature
Jan	1	Chicago	USW00014819	25.6
Jan	1	San Diego	USW00093107	55.2
Jan	1	Houston	USW00012918	53.9
Jan	1	Death Valley	USC00042319	51.0
Jan	2	Chicago	USW00014819	25.5
Jan	2	San Diego	USW00093107	55.3
Jan	2	Houston	USW00012918	53.8
Jan	2	Death Valley	USC00042319	51.2
Jan	3	Chicago	USW00014819	25.3
Jan	3	San Diego	USW00093107	55.3
Jan	3	Death Valley	USC00042319	51.3
Jan	3	Houston	USW00012918	53.8

2.2 Scales map data values onto aesthetics

To map data values onto aesthetics, we need to specify which data values correspond to which specific aesthetics values. For example, if our graphic has an x axis, then we need to specify which data values fall onto particular positions along this axis. Similarly, we may need to specify which data values are represented by particular shapes or colors. This mapping between data values and aesthetics values is created via scales. A scale defines a unique mapping between data and aesthetics (Figure 2.2). Importantly, a scale must be one-to-one, such that for each specific data value there is exactly one aesthetics value and vice versa. If a scale isn’t one-to-one, then the data visualization becomes ambiguous.

Figure 2.2: Scales link data values to aesthetics. Here, the numbers 1 through 4 have been mapped onto a position scale, a shape scale, and a color scale. For each scale, each number corresponds to a unique position, shape, or color and vice versa.

See Also

Greek Classical

Let’s put things into practice. We can take the dataset shown in Table 2.2, map temperature onto the y axis, day of the year onto the x axis, location onto color, and visualize these aesthetics with solid lines. The result is a standard line plot showing the temperature normals at the four locations as they change during the year (Figure 2.3).

Figure 2.3: Daily temperature normals for four selected locations in the U.S. Temperature is mapped to the y axis, day of the year to the x axis, and location to line color. Data source: NOAA.

Figure 2.3 is a fairly standard visualization for a temperature curve and likely the visualization most data scientists would intuitively choose first. However, it is up to us which variables we map onto which scales. For example, instead of mapping temperature onto the y axis and location onto color, we can do the opposite. Because now the key variable of interest (temperature) is shown as color, we need to show sufficiently large colored areas for the color to convey useful information (Stone, Albers Szafir, and Setlur 2014). Therefore, for this visualization I have chosen squares instead of lines, one for each month and location, and I have colored them by the average temperature normal for each month (Figure 2.4).

Figure 2.4: Monthly normal mean temperatures for four locations in the U.S. Data source: NOAA

I would like to emphasize that Figure 2.4 uses two position scales (month along the x axis and location along the y axis) but neither is a continuous scale. Month is an ordered factor with 12 levels and location is an unordered factor with four levels. Therefore, the two position scales are both discrete. For discrete position scales, we generally place the different levels of the factor at an equal spacing along the axis. If the factor is ordered (as is here the case for month), then the levels need to placed in the appropriate order. If the factor is unordered (as is here the case for location), then the order is arbitrary, and we can choose any order we want. I have ordered the locations from overall coldest (Chicago) to overall hottest (Death Valley) to generate a pleasant staggering of colors. However, I could have chosen any other order and the figure would have been equally valid.

Both Figures 2.3 and 2.4 used three scales in total, two position scales and one color scale. This is a typical number of scales for a basic visualization, but we can use more than three scales at once. Figure 2.5 uses five scales, two position scales, one color scale, one size scale, and one shape scale, and all scales represent a different variable from the dataset.

Figure 2.5: Fuel efficiency versus displacement, for 32 cars (1973–74 models). This figure uses five separate scales to represent data: (i) the x axis (displacement); (ii) the y axis (fuel efficiency); (iii) the color of the data points (power); (iv) the size of the data points (weight); and (v) the shape of the data points (number of cylinders). Four of the five variables displayed (displacement, fuel efficiency, power, and weight) are numerical continuous. The remaining one (number of cylinders) can be considered to be either numerical discrete or qualitative ordered. Data source: Motor Trend, 1974.

References

Stone, M., D. Albers Szafir, and V. Setlur. 2014. “An Engineering Model for Color Difference as a Function of Size.” In 22nd Color and Imaging Conference. Society for Imaging Science and Technology.

Fundamentals of Data Visualization (2024)

FAQs

What are the basics of data visualization? ›

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

Get More Info ›

What are the 7 stages of data visualization? ›

1 6.
Step 1: Define a clear purpose.
Step 2: Know your audience.
Step 3: Keep visualizations simple.
Step 4: Choose the right visual.
Step 5: Make sure your visualizations are inclusive.
Step 6: Provide context.
Step 7: Make it actionable.

Keep Reading ›

What are the 4 pillars of data visualization? ›

What is the measurement entity that drives performance management? As explained in this Edraw article1, every data visual has one of the five key objectives—distribution, composition, relationship, trend and comparison.

Learn More Now ›

What are the 5 steps in data visualization? ›

Step 1 — Be clear on the question. ...
Step 2 — Know your data and start with basic visualizations. ...
Step 3 — Identify messages of the visualization, and generate the most informative.
Step 4 — Choose the right chart type. ...
Step 5 — Use color, size, scale, shapes and labels to direct attention to the key.

Find Out More ›

What are the 3 C's of data visualization? ›

The three Cs of data visualization are correlation, clustering, and color.

Explore More ›

What are the 3 rules of data visualization? ›

Conclusion. To recap, here are the three most effective data visualization techniques you can use to deliver presentations that people understand and remember: compare to a real object, include a visual, and give context to your numbers. Try using one or more of these techniques in your next presentation.

Read The Full Story ›

What is the golden rule of data visualization? ›

This is the golden rule. Always choose the simplest way to convey your information. Identify the relationships and patterns of your data and focus on what you want to show. Depict nominal data.

Keep Reading ›

What are the 3 main goals of data visualization? ›

The three main goals of data visualization are to help organizations and individuals explore, monitor and explain insights within data.

Keep Reading ›

What are the 4 main visualization types? ›

Most Common Types of Data Visualization

Column Chart. They are a straightforward, time-tested method of comparing several collections of data. ...
Line Graph. A line graph is used to show trends, development, or changes through time. ...
Pie Chart. ...
Bar Chart. ...
Heat Maps. ...
Scatter Plot. ...
Bubble Chart. ...
Funnel Chart.

More items...

Jun 5, 2023

Tell Me More ›

What are the big three in data visualization? ›

Working directly with one of the world's largest telecommunications brands, I've seen firsthand how presenting data in the form of these three visual representations can lead to better understanding and decision-making.

Explore More ›

What are the five data visualization techniques? ›

There are several common techniques used for data visualization: charts (bar, line, pie, etc.), plots (scatter, bubble, box, etc.), maps (heatmaps, dot distribution maps, cartograms, etc.), diagrams and matrices. What data visualization tools and platforms are available in the market?

Keep Reading ›

What are the three tools of data Visualisation? ›

What Are Data Visualization Tools? Some of the best data visualization tools include Google Charts, Tableau, Grafana, Chartist, FusionCharts, Datawrapper, Infogram, and ChartBlocks etc. These tools support a variety of visual styles, be simple and easy to use, and be capable of handling a large volume of data.

Know More ›

What are the 5 C's of data visualization? ›

However, there are five characteristics of data that will apply across all of your data: clean, consistent, conformed, current, and comprehensive. The five Cs of data apply to all forms of data, big or small.

Get More Info Here ›

What are the 4 stages of data visualization? ›

These stages are exploration, analysis, synthesis, and presentation.

View Details ›

What is the basic principle of data visualization? ›

It is a way to communicate complex information in a visual and intuitive manner, making it easier for people to understand and analyze the data. By transforming raw data into visual representations, data visualization allows patterns, trends, and insights to be easily identified and interpreted.

Read The Full Story ›

What are the basic data visualization types? ›

Most Common Types of Data Visualization

Column Chart. They are a straightforward, time-tested method of comparing several collections of data. ...
Line Graph. A line graph is used to show trends, development, or changes through time. ...
Pie Chart. ...
Bar Chart. ...
Heat Maps. ...
Scatter Plot. ...
Bubble Chart. ...
Funnel Chart.

More items...

Jun 5, 2023

Show Me More ›

What are the three most important principles of data visualization? ›

In this article, we will explore the three fundamental principles of data visualization: selecting the appropriate chart type, communicating clearly with your chart, and amplifying your data with design basics.