Understanding Modeled Data & Observed Data Concept in Google Analytics [GA4]

I have always wanted to write about Google Analytics’ (GA4) data modeling capabilities. While scrolling through the GA4 documentation on behavior modeling for consent mode, I came across a subheading referencing “Modeled Data Vs Observed Data,” which compelled and inspired me to write an article on this topic in a manner understandable to everyone, regardless of their technical background and GA4 experience.

The information I’ll share in this article will be easy to comprehend, though not as comprehensive as the official documentation.

For more detailed information, you can always refer to the GA4 documentation or watch the YouTube video by the Google Team on data modeling in GA4. However, in this article, I’ll use basic examples that anyone, regardless of their technical background and GA4 experience, can understand to help you grasp the concepts of observed data and modeled data in Google Analytics.

The current version of Google Analytics, GA4, can model data in your analytics property for various purposes, such as key event (conversion) and behavioral reporting, attribution, and providing predictive insights using predictive metrics.

It can seamlessly integrate and display both modeled and observed data in your GA4 reports, provided that modeling requirements are met for the analytics property.

In a GA4 report, you may notice differences when comparing reports that include modeled data with those that use only observed data (a term I’ll later expatiate on as we go further in this guide). For example, reports that include modeled data might show higher user and event counts.

Unlike Universal Analytics, Google Analytics (GA4) can report data using both observed and modeled data. This improvement addresses the growing need to respect user privacy and comply with privacy regulations such as GDPR, CCPA and GPC while still providing the necessary data for decision-making.

Observed Data in Google Analytics (GA4)

In GA4, observed data refers to user data that has been consented to and collected using persistent user identifiers such as User ID, Google Signals, and Device ID. This type of data is similar to what you’re familiar with in Universal Analytics.

As the name suggests, observed data does not include the traffic data of users who did not grant consent, assuming you have implemented a consent banner (commonly known as a cookie banner) to allow your website visitors to have control over how their data is used and collected.

For example, if you have a consent banner on your website and, out of 130 visitors, 30 did not consent to data collection, the data collected from the remaining 100 visitors who did consent is what is referred to as observed data in Google Analytics.

Relying solely on observed data can be risky if you don’t have a strategy in place to address data gaps, especially if you can’t determine your consent acceptance rate or have a poor consent acceptance rate. These data gaps can prevent you from clearly understanding user behavior on your website or the performance of your marketing tactics.

However, GA4 offers the capability to fill some of these gaps using machine learning, making it a powerful tool even if it doesn’t provide an exact picture.

It’s important to note that observed data isn’t only affected by consent banners. Factors such as ad blockers, browser security and privacy settings, and technology can also impact the amount of observed data recorded in your analytics property.

Additionally, Google uses observed data and other signals to train the machine learning models that fill data gaps, providing better reporting. This helps account for privacy-related data gaps due to declines in consent banners and other factors in various data model types.

Modeled Data in Google Analytics (GA4)

In simple terms, modeled data is an estimate Google provides in GA4 to help you better understand your website’s performance. This estimate is generated using machine learning to fill in the gaps caused by factors such as consent banners, technological limitations, and security restrictions on data collection.

It uses your observed data as training data and other signals that Google employs for key event (conversion) and behavioral modeling. These signals include:

  • Event type
  • Location data
  • Referrer
  • Date and time
  • Non-identifiable dimensions associated with the device (e.g., browser, device type)

It’s important to note that these signals vary depending on the specific modeling applied to your Google Analytics property data.

In GA4, there are four use cases of data modeling that Google Analytics uses:

  • Key event modeling (previously conversion modeling)
  • Attribution modeling (data-driven attribution)
  • behavioral modeling
  • Predictive metrics
Image Source: GA4’s Webinar by Google

This article focuses on helping you understand observed data and modeled data, specifically key event (conversion) and behavioral modeled data reported in Google Analytics.

Behavioral Modeling in Google Analytics 4

behavioral modeling in GA4 helps you understand the actions of users who did not consent to data collection while ensuring compliance and preserving user anonymity. This means no actual identifiers (e.g., user ID, device ID) are tied to the data.

I have previously written about this topic, covering about 15 popular questions about Behavioral Modeling in Google Analytics (GA4), which I recommend you check out.

When you implement a consent banner on your website or app, Google Analytics may miss data from users who decline consent or give partial consent.

behavioral modeling, enabled through Google Consent Mode, uses machine learning to model the behavior of users who decline analytics cookies based on the behavior of similar users who accept analytics cookies.

This data modeling use case allows your analytics property to display modeled data, allowing you to gain valuable insights from your GA4 reports while respecting user privacy, as Google claims.

For example, behavioral modeling estimates data you use for reporting user and session metrics, such as daily active users and conversion rates, which may be unobservable when identifiers like cookies or user IDs are unavailable. When users don’t grant consent, events are not associated with a persistent user identifier.

So, with the advanced Mode “Google Consent Mode V2,” if Google Analytics anonymously collects a total of 15 page_view events, it can’t determine whether they represent 15 distinct users or a single user. Instead, Analytics uses machine learning to estimate user behavior based on similar consenting users and other signals.

Additionally, you should note that Google does not use any fingerprinting technology or mechanism, as mentioned in the analytics documentation. Instead, it relies solely on signals obtained from unconsented data and observed data.

Suppose you are setting up Google Consent Mode. In that case, check out the DumbData FREE Google Consent Mode Impact Estimator & Planning Tool to estimate and communicate the implementation’s impact to your client or stakeholders.

And, if you are debugging your consent mode setup, I also have a guide that provides steps for debugging your Google Consent Mode V2 implementation, with tools and tips for auditing your privacy compliance.

For those not using Google Analytics but still wanting to set up consent mode with Google Ads products, I have a guide for Piwik Pro users on integrating Piwik Pro and Google Consent Mode.

Limitations

While modeled data in GA4 helps cover data gaps caused by declines in consent, it has limitations:

  • You can’t use behavioral-modeled data for audience creation in GA4. However, you can use predictive metrics to build a predictive audience, which GA4 derives by modeling your observed data.
  • Your behavioral-modeled data is not supported in user explorer, cohort, and user lifetime exploration reports.
  • modeled behavioral data cannot be used when creating segments with sequences.
  • Google Analytics retention reports do not support the use of modeled data.
  • Predictive metrics do not include behavioral-modeled data, as the data is not tied to specific identifiable users within GA4.
  • modeled data cannot be exported (e.g., BigQuery export), although this might change.

Key Event Modeling (previously Conversion Modeling) in Google Analytics (GA4)

Key event modeling in GA4 helps report conversions that might otherwise be lost due to privacy restrictions, technological limitations, and declines in consent banners.

It’s important to note that Key event modeling differs from attribution modeling. However, they are both black boxes and provide insight into a marketing channel’s true value without changing the number of conversions collected in your analytics property.

Google uses modeling to estimate online conversions that cannot be directly observed. It then uses machine learning to assign links between ad interactions and conversions, accounting for cases where cookies and identifiers weren’t available.

Google claims that these enable accurate conversion reporting without identifying users, which is particularly useful for addressing user privacy concerns, technical constraints, or cross-device user activity.

Their documentation also mentions that it can help optimize advertising campaigns and improve automated bidding.

Google’s models identify trends between directly observed and unobserved conversions.

For instance, if conversions observed on one browser are similar to unattributed conversions on another browser, the machine learning model predicts overall attribution. Conversions are then aggregated to include both modeled and observed data.

Core reports (such as Event, Conversions, and Attribution reports) and Explorations where event-scoped dimensions are selected will include modeled data. These reports automatically attribute conversion events across channels using a combination of observed and modeled data.

Critical Points about Key Event Modeling

First-Party Cookies Lifespan:

Browsers that limit the time window for first-party cookies will have key events (beyond the window) modeled. For example, suppose a visitor’s browser deletes first-party cookies after 24 hours. In that case, key event modeling can fill the gap if the user returns later to complete a key event action (conversion).

Growing Consent Requirements:

Some regions, like those under GDPR and LGPD, require consent before using cookies for digital measurement, advertising, etc. When advertisers use consent mode, key events are modeled for users who have not consented.

Apple’s App Tracking Transparency (ATT) Policy:

This policy requires developers to obtain permission to use certain information from other apps and websites. Google does not use information (such as IDFA) under ATT policy. Key events from ads traffic from ATT-impacted traffic are modeled.

Cross-Device Interactions:

When the ad interaction and the key event occur on different devices, key events may be modeled.

View True Ad Interactions:

Key event modeling covers both click-based events and engaged views for YouTube, aiding in attribution for engaged-view key events.

Imported Google Ads Conversions:

Any Google Ads conversions created based on Google Analytics key events will include modeled data.

According to Google’s documentation, when analyzing attribution reports in Google Analytics, keep in mind that attributed conversion data for each channel can be updated for up to 12 days after the conversion is recorded. This is due to the ongoing processing and model training. Select a date range that extends beyond or before the previous week for increased accuracy.

Final Thoughts

We have explored the concepts of observed data and modeled data in Google Analytics, briefly touching on the different use cases of data modeling in GA4.

Our focus then narrowed to behavioral and conversion-modeled data, explaining their purposes and some vital information.

If you have any feedback, corrections, or additional insights, please reach out via the DumbData Contact Us page or personally on LinkedIn. Happy measuring.

You might also enjoy

More
articles

Before you leave, try out these free tools.
FREE GA4 AUDIT TOOL & UTM AUDIT TOOL