4 Methods To “Proactively” Redact PII In Google Analytics (GA4)

This is a proactive approach to redacting personally identifiable information (PII) in Google Analytics (GA4), and this article explores four methods of doing this alongside the “Data Redaction” feature.

Google doesn’t want you collecting your website visitors’ personally identifiable information, or PII, and transmitting it to their analytics server. This same level of stringency in handling PII persists in their next-generation analytics solution, known as “Google Analytics 4” or the popular phrase “GA4,” as per Google’s policies. These policies dictate that no data that Google could use or identify as personally identifiable information (PII) should be sent to their analytics solution.

The recent introduction of the data redaction feature in Google Analytics (GA4) further underscores Google’s commitment to protecting PII in their analytics tool. I have previously covered this feature in an article on Datola and shared insightful perspectives from Brias Calvo during our conversation.

However, it’s essential to note that out of the box, the only proactive aspect of this feature is the automatic “email redaction,” which conceals email addresses before it’s transported to the Google Analytics server for processing. Nonetheless, it’s essential to recognise that PII extends beyond email addresses alone.

In this article, we will explore a proactive approach to redacting PII in Google Analytics (GA4) rather than a reactive one where you wait until you’ve collected personally identifiable data about your website visitors before taking action in GA4.

The GA4 PII redaction solutions we will discuss can effectively redact the following types of personal data and are compatible with the Google Analytics data redaction feature:

  • Email addresses (both encoded and decoded)
  • Phone numbers
  • Passwords
  • Names
  • Addresses
  • Geographical coordinates
  • Postal codes
  • Credit card details (Mastercard, Amex, and Visa)
  • Social Security numbers

I will demonstrate these proactive methods, but first, I’ll briefly explain what PII entails and how it finds its way into Google Analytics.

What Constitutes PII and Google’s Perspective on PII

PII, which stands for “Personally Identifiable Information,” refers to data that, on its own, can directly identify, contact, or precisely locate an individual. 

An intriguing aspect of PII is that its definition and what qualifies as PII can vary across different privacy regulations, such as GDPR, DPA, HIPAA, LGPD, and others. However, they all share certain commonalities. 

Therefore, in their documentation, Google recommends consulting with legal counsel to interpret what constitutes Personally Identifiable Information for your business, depending on the privacy framework applicable to your business entity.

In the context of GA4, let’s examine how Google defines PII and what they advise against sending to their platform:

  • Email addresses
  • Home or mailing addresses
  • Phone numbers
  • Precise locations (including GPS coordinates, with a caveat)
  • Full names or usernames

Since this article does not focus on a specific privacy regulation, we will emphasise how to prevent the storage of such data within your business’s Google Analytics property, with a primary focus on GA4. How does this type of data end up in our GA4 property? The following section delves into this question.

According to Google, user IDs occupy a somewhat neutral ground, as they exist as permissible personal data, provided they don’t get sent as a name, email, or phone number.

How Does PII Find Its Way Into Your Google Analytics (GA4) Data?

Personally identifiable data can end up in your GA4 property’s data through various channels, most commonly through automatic collection, manual data transmission, or users inputting the information themselves, although the latter is relatively rare. Here are some of how PII can inadvertently enter the data collected in GA4:

  • Page URLs and page titles
  • Event properties data (particularly parameters)
  • Data imports (especially user data import)
  • User IDs (when email serves as the user ID sent to GA4)
  • Site search data (when users manually enter such information into the search form, although this occurrence is infrequent).
  • User properties that have the possibility of housing PII

Details of each item can be found in Google’s documentation.

Numerous methods are available for identifying the collection of personally identifiable data in your GA4 Property. To streamline the discussion, I will forgo a detailed exploration of these methods, both the straightforward and more challenging ones, and instead dive directly into the proactive measures you can implement.

How to Proactively Redact PII in Google Analytics (GA4)

This section of the article discusses the four proactive methods for redacting personally identifiable data in GA4, offering protective measures beyond email addresses and Google’s conventional definition of personally identifiable information. These methods enable you to take a proactive stance in redacting PII that could find its way into your Google Analytics property.

By utilising the techniques outlined in this blog post alongside the GA4 data redaction feature, you can actively redact data classified as personally identifiable data points rather than adopting a reactive approach to masking these data types. Let’s explore these methods:

  1. Utilising JavaScript in Conjunction with the GA4 Redact Feature
  2. Making Use of a Custom GTM Template in Combination with the GA4 Redact Feature
  3. Leveraging the GA4 Redact Feature Alongside Possible PII Query Keys
  4. Collaborating with Your Development and Marketing Teams in Tandem with the GA4 Redact Feature

Before delving into each method, it’s worth taking a brief overview of the GA4 Redact Data feature, as it is applicable to all the approaches discussed in this blog post.

Where Can You Find and Utilise the GA4 Data Redaction Feature?

The Google Analytics data redaction feature is accessible within your GA4 data stream settings, and it offers two significant functionalities: automatic email redaction and PII query parameter value redaction based on the keys you’ve specified to be targeted for redaction. 

Enabling the first option within this feature ensures that GA4 will automatically redact all email addresses (encoded and decoded) found in your data stream event data, and as for the query parameters-based redaction, it targets the following dimensions;

  • page_location
  • page_referrer
  • page_path
  • link_url
  • video_url
  • form_destination

When you select the “Redact Data” option, it opens the following view.

You’ll have to activate the email redaction settings and specify the queries that identify to hold values that are considered personally identifiable data of your website visitors. Afterwards, click the save button.

You can use the debug view in GA4 to see the redaction take effect.

The email redaction also happens to the email address found in the event parameters of GA4 events, as shown in the images below.

Using Analytics Debugger.

According to Google, the “email redaction” will be automatically enabled for newly created Google Analytics properties.

It’s crucial to note that redaction occurs on the client side, ensuring that data never reaches Google’s servers.

Now that we’ve covered this aspect let’s delve into the outlined methods.

Method 1: Proactively Redacting PII in GA4 Using JavaScript + GA4 Redact Feature

This solution, originally developed by DumbData for redacting Personally Identifiable Information (PII) in Piwik Pro Analytics, has been adapted for Google Analytics users due to its robust redaction capabilities. The JavaScript code allows you to redact various categories of PII data that might enter your Google Analytics via the strings within the page URLs. These categories include:

  • Email addresses (both encoded and decoded)
  • Phone numbers
  • Passwords
  • Names
  • Addresses
  • Geographical coordinates
  • Postal codes
  • Credit card details (Mastercard, Amex, and Visa)
  • Social Security numbers

While the primary focus of redaction is on the Page URL and any associated page metadata, it can also extend to PII found within search fields if you are using the search enhancement measurement event to track search activities on your website.

The process commences by creating a new custom JavaScript variable in Google Tag Manager and then pasting the provided JavaScript code into this variable. Assign a descriptive name to the variable and save it.

function() {
function redactURL() {
  var url = window.location.href;

  // Redact emails
  url = url.replace(/(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b|\b[A-Za-z0-9._%+-]+%40[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b)/gi, '[REDACTED_EMAIL]');

  // Redact phone numbers
  url = url.replace(/\b(\+\d{1,2}\s?)?(\(\d{3}\)|\d{3})[\s.-]?\d{3}[\s.-]?\d{4}\b/gi, '[REDACTED_PHONE]');

  // Redact phone numbers two (with equal sign)
  url = url.replace(/(\btel=|telephone=|phone=|mobile=|mob=)([^&\s]*)/gi, '$1[REDACTED_PHONE2]');

  // Redact passwords
  url = url.replace(/(\bpassword=|passwd=|pass=)([^&\s]*)/gi, '$1[REDACTED_PASSWORD]');

  // Redact names
  url = url.replace(/(\bfirstname=|lastname=|fullname=|fname=|lname=|surname=|first_name=|last_name=|fn=|ln=|username=)([^&\s]*)/gi, '$1[REDACTED_NAME]');

  // Redact addresses
  url = url.replace(/(\baddress=|street=|road=|drive=|pobox=|po%20box=|po_box=|address_street=|address_name=)([^&\s]*)/gi, '$1[REDACTED_ADDRESS]');

  // Redact other location information
  url = url.replace(/(\baddress_country_code=|address_state=|lat=|lon=)([^&\s]*)/gi, '$1[REDACTED_LOCATION]');

  // Redact zip codes
  url = url.replace(/(\bpostcode=|zipcode=|zip=|address_zip=)([^&\s]*)/gi, '$1[REDACTED_ZIPCODE]');

  // Redact credit card (Visa/MC)
  url = url.replace(/[=,;]\s*(\d{4}[-\s+]*){3}\d{4}($|[,;:/?&#])/gi, '[REDACTED_PAYMENT_CARD]');

  // Redact credit card (Amex)
  url = url.replace(/[=,;]\s*\d{4}[-\s+]*\d{6}[-\s+]*\d{5}[-\s+]*($|[,;:/?&#])/gi, '[REDACTED_PAYMENT_CARD]');

  // Redact social security number
  url = url.replace(/[=,;]\s*\d{3}[-\s+]*\d{2}[-\s+]*\d{4}($|[,;:/?&#])/gi, '[REDACTED_SSN]');

  return url;
}

//  return the redacted URL
return redactURL()
}

*The code is customizable in case you need to make some adjustments to what is classified to hold PII values.

The next step involves:

  • Navigate to your “Google Tag” or the config tag settings variable.
  • Adding a new field named “page_location” under the configuration.
  • Specifying the use of the variable you have just created.

Following this, you can preview your setup, thereby proactively redacting PII in Google Analytics in conjunction with your GA4 redact feature configuration.

One noteworthy aspect of this method is that even if you have implemented GA4 without Google Tag Manager, you can still utilise the JavaScript code with a minor adjustment.

Method 2: Using Google Tag Manager Template with the Native GA4 Redact Feature

This method is remarkably straightforward. All you need to do is add the GTM variable custom template, named “URL – PII Redactor,” created by “Mikeulrich75.”

I have provided a comprehensive step-by-step guide on how to implement this on Datola, where I discuss this feature in conjunction with the GA4 redact feature. 

Additionally, you’ll need to leverage the regex pattern below to utilise this template alongside your GA4 redact feature proactively. You can see insightful details about this approach in the Datola article.

tel|telephone|phone|mobile|mob|password|passwd|pass|firstname|lastname|fullname|fname|lname|surname|first_name|last_name|fn|ln|username|address|street|road|drive|pobox|po%20box|po_box|address_street|address_name|address_country_code|address_state|lat|lon|postcode|zipcode|zip|address_zip|visacard|visa_card|master_card|mastercard|amex|credit_card

Method 3: Leveraging the GA4 Redact Feature with Potential PII Query Keys

This method revolves around making use of the existing Google Analytics data redaction capability within the tool while taking presumably proactive measures with various other PII data types collected through URL query parameters.

You can accomplish this by ensuring the addition of certain query parameter keys known to contain PII values occasionally. To do so, simply add these query parameter keys to the “redact URL query parameters” field. Please be aware that you can include a maximum of thirty (30) query parameters here.

You have the option to select from any of the parameters listed below.

  • telephone
  • phone
  • lastname
  • firstname
  • name
  • zipcode
  • password

It’s important to know that the parameters that you’ve added are not case sensitive, which means once it’s identified in the page URL, regardless of the case format, the value it holds gets redacted.

Method 4: Collaboration With Your Development and Marketing Teams Alongside the GA4 Redact Feature

Similar to the third method, this approach relies on only the GA4 native redact feature. However, it includes proactively leveraging valuable insights that can be provided only by your development and marketing teams.

In this method, you will convene with your engineering and marketing teams, providing them with a spreadsheet or document to compile a list of query parameter keys that may potentially contain personally identifiable data you wish to redact. Once this list is provided, you can add the identified keys to the query key parameters field within the redaction feature of Google Analytics.

While you’re at it, you can also look at the historical data to see if any form of personally identifiable data has been collected in the past, that’s if the GA4 is not a new property.

Closing Thoughts on PII Redaction In Google Analytics

Congratulations on reaching the conclusion of this blog. However, one more critical point to address is the benefits of adopting a proactive approach when redacting Personally Identifiable Information (PII) in Google Analytics. One key advantage of proactivity is that it spares you the headache of deleting data upon discovering the presence of PII, which can lead to unintended loss of non-PII data due to the behaviour of GA4’s data deletion solution. By being proactive, you can maintain confidence that PII won’t inadvertently seep into your GA4 property.

To recap, we began by examining Google’s stance on the collection and storage of PII in Google Analytics and delved into what it entails and how it finds its way into GA4.

We then explored the GA4 redact feature, its location, and how to navigate it, along with four proactive methods to consider when concealing the personally identifiable data of your website visitors in Google Analytics.

I would personally opt for methods 1 and 2 for my proactive approach. Following that, I would engage with the engineering and marketing teams to identify any potentially missed query keys, which I would subsequently include in the query parameters key field within the GA4 redaction feature, or even use the insight on omitted query parameters to update the JavaScript code that we shared in the method 1, yeah the code is customisable.

Furthermore, the initial two methods discussed can also be applied to redact data considered personal information when using ad platform measurement pixels and server-side tagging, which opens up fantastic privacy opportunities for redacting PII sent to advertising vendors like Meta (formerly Facebook), Snapchat Pixel, and others.

Please remember that redacting PII in Google Analytics is just one piece of the puzzle when it comes to ensuring your business’s privacy compliance with how it utilizes Google Analytics. It is advisable to undertake this in collaboration with a privacy and legal expert.

I hope you found this article informative and insightful. I would greatly appreciate hearing your thoughts, so please don’t hesitate to reach out to me on LinkedIn to share your feedback.

You might also enjoy

More
articles