Line, Square, Cube: How Dimenstions Shape Our Understanding of Analytics
This article was originally written in August 2021.
As an analyst, one of our most important job responsibilities is to define the success metrics for every project we’re a part of; site sessions, click-throughs, and conversion rates are a daily matter for consideration. But, while we frequently discuss which KPIs and other data will be collected as quantitative measures of success, there is a whole other side to analytics that is often left as a footnote in these same discussions. And without it, our metrics would lack the context needed to bring forth meaningful insights. In short, many of these conversations often lack Dimension.
The starting line.
Dimensions, from an analytical perspective, are the non-measurable qualities associated with an event; more simply, they provide context to anything happening on a website. For example, when a user visits a website, Google Analytics might capture the following metrics:
1 site session,
that visited 3 pages,
clicking 1 item on each page,
and left after having spent 2:37 on the site.
What exactly do we get from this? In aggregate, it can be quite useful information- the user was interested enough not to bounce (leave immediately), and the click actions and time spent point towards an engaged user session. However, if we wanted to understand what the user engaged with, and perhaps start to get at why they were engaged, we’ll need to look at the dimensions associated with that session:
A woman, aged 25-34, in Tempe, Arizona, United States, North America,
using a Samsung Galaxy 9+ mobile device, running Android 10 on a 360x800 screen,
entered the homepage of the website from an Organic Search for the term “Pasta Salad”,
clicked the Recipes button and went to the Recipes page,
clicked the “Rotini Pasta Salad” article link and went to the Rotini Pasta Salad with Cheese page.
These dimensions are used to give meaning to the aggregate information generated by the metrics; we can now reasonably understand that this user found our site while looking for content related to pasta salad, and quickly navigated to one of the site’s recipes. In this way, dimensions are crucial to any measurement strategy. Following this example, once we know how much site traffic increased in the last month, we next want to know which pages saw increases, what drove users to those pages, and ultimately explain why that growth occurred; dimensions let us segment data meaningfully and drive insights.
A less abstract way to think of dimensions is in the context of a standard analytics report- metrics are all the numbers running along the top of the report, and dimensions are the values running along the side:
Squaring up with dimensions.
The powerful, but granular nature of dimensions can also quite quickly become too much of a good thing. Suppose that the website above has 1,000 unique pages. A user might view any page on their Desktop computer, a Tablet, or their Mobile phone. And for each of the 3 device types, they could’ve arrived from one of 10 different channels (such as Organic Search or a Display ad), and from any of the ~190 countries and 50 US states which could be used as geographic points of differentiation.
Running a report that accounted for just these dimensions, and asking how many times users viewed each page, would result in a table that could have more than 7 million unique data points and would, effectively, be unusable. However, if we didn’t use any dimensions, the resulting report would only show the volume of traffic for each page, and we’d be left wondering what actions drove meaningful changes to traffic volume.
These potential issues with dimensions and data creep are not just hypothetical. In my time as an analyst, I’ve had analytics work delayed or prevented on multiple projects due to poor dimension handling. For example, an existing site’s poorly implemented search results appended all search selections to a URL in the order they were chosen- with dozens of dimensions and hundreds of unique options, the result was a near-infinite number of search outputs and an incredibly messy dataset. Most of the project’s time was spent simply trying to comprehend the data available, rather than analysis.
A very different project involved campaign reporting, where dimensions had been well-defined before the campaign launched, but were not planned against properly when creative was put out into market and most ads ran with no directly competing ad on our servers. This meant that, despite having tons of potentially useful data and well-defined dimensions, the one thing we could not say conclusively at the end of the campaign was which content generally performed best, or why- and that happened to be what the client wanted to hear most.
In both of the above scenarios, the issue wasn’t necessarily that data didn’t exist, but rather that the data collected didn’t properly account for its eventual analysis. This resulted in a significant portion of time, which could’ve been spent analyzing the data, instead being used to clean the data up for that analysis- with deadlines and other project constraints, this sometimes results in missed opportunities or exploration.
Fortunately, while dimensions can get out of hand quite quickly when not properly accounted for, there are a number of strategies we can employ to help reign them in during the life of a project.
Thinking outside the box.
The first solution is to be upfront about which dimensions matter most. For instance, Google Analytics captures not just whether a user came to the website from a phone or desktop, but also notes the device model, device software version, device browser, browser version, and screen resolution for each session. For most websites, these are generally dimensions that only come up when discussing breakpoints for site development, and can be mostly ignored- just be aware of their existence when, say, a new app launch might meaningfully differentiate between iOS and Android users.
To ensure that stakeholders agree which dimensions are most critical to a project, we recommend defining key dimensions for data at the same time as KPIs and metrics are defined in a measurement plan. This way, when site benchmarking occurs or reports are developed, there is an understanding of which segmentations are considered most valuable. And, just like how many tracked metrics aren’t mentioned in a measurement plan, there will be similar unmentioned default dimensions in the final site data.
Once key dimensions have been agreed upon, a key method of wrangling in dimensions is to “bucket” multiple dimension values into larger groupings. Using the earlier example of our international website, tracking sessions from just the 10 countries that make up the bulk of monthly traffic, plus an “other” category, drastically reduces the volume of data we’d sift through monthly.
Bucketing can also be just as helpful when there are only a few values for a dimension. A common example of this is done when combining Mobile and Tablet traffic in certain reports; many projects are ultimately seeking distinction between “desktop” and “non-desktop” traffic, and for these projects this small change in reporting device types reduces the quantity of data analyzed or visually reported by 33%.
So, what’s the point?
By defining dimensions early on in a project, we can later maximize our time spent analyzing data. For most clients, we’ll eventually receive access to more data than we could ever easily consume, assuming we gave every dimension provided equal weight. However, by recognizing what qualities are most likely to differentiate our users’ behaviors or experiences, we can narrow in on a key set of variables and spend more time doing meaningful analysis.
Ultimately, it’s our job to define which metrics and dimensions will be the most useful for our clients, and to ensure that dimensions aren’t treated as an afterthought when it comes time to measure the work we’ve done. Preparing a defined list of dimensions allows for consistent reporting and focused insights, with key intersections of user qualities tracked against each other rather than against a theoretically infinite number of variations.