Cookieless Tracking with GA4

Published in Analytics, Data Privacy, Tracking | 07.11.2021

Due to its new data model, cookieless tracking with GA4 needs some additional configurations to work properly. Unlike Universal Analytics, this feature is not provided by default. In his blog, Mark Edmondson has already made a suggestion of what a solution might look like. However, this approach is of limited use because the GA4 protocol is based on additional parameters that Mark does not take into account.

In the following, I will present an approach that enables functional cookie-less GA4 tracking using the Consent Mode, the server-side Google Tag Manager, and a database accessible via a REST API.

Introduction

Since the use of cookies has been increasingly restricted by privacy laws and browser settings, attention has more and more turned to alternative mechanisms or even the complete abandonment of user identities. What was still relatively easy to do with Universal Analytics has become more difficult with GA4, because the storage:none setting provided in UA is not available for the gtag script.

A way out is offered by the new Consent Mode, which regulates the use of browser memory by analytics or marketing tags centrally. Each cookie request is allowed or prevented based on the respective user consent. However, it also decides whether data is displayed in the Google Analytics interface at all. Data collected without consent is processed separately by Google and analyzed with artificial intelligence - but not included in the data collection. However, with the server-side Tag Manager, we can change this setting to ensure that all data is collected.

Tracking without cookies with Consent Mode and GA4

Consent Mode offers four modes that are relevant for tracking: denied and granted each decide for all Analytics or Advertising tags whether they are allowed to access browser memory. This mode can be used to prevent GA4 from setting cookies.

At the beginning of each page load these modes are set and stored in the dataLayer:

Consent Mode in the dataLayer

If the Consent Mode is active, a parameter named gcs is appended to the Google Analytics tags. If the use of the browser memory was agreed to, this parameter gets the value G111, otherwise it will be set as G100. Consequently, only tags with the parameter value G111 will show up in Google Analytics, while tags with the value G100 will be used by Google for conversion attribution afterwards.

g/collect?v=2
&tid=G-ABCDEFGH
&gtm=2reb31
&_p=739263894
&sr=1680x1050
&gcs=G100                   // Consent-Value
&ul=de-de
&cid=GA123456.12314516
&_fplc=0
&ir=1
&_s=1
&dl=...
&dr=...
&sid=1636287718
&sct=1
&seg=0
&en=page_view
&_fv=1
&_ss=1
&_eu=Q

In order to track without cookies, we want a tag with the parameter value G100, which does not set cookies and yet is fully displayed in Google Analytics. Based on an example developed by Mark Edmondson, we'll take a look at how this works.

Server-side processing of the cookieless GA4 event.

The server-side tag manager offers the possibility to process each incoming request before it is forwarded to the actual destination. This allows for additional protection of user data by, for example, removing the IP address from the request.

Also, the Consent Mode setting can be changed afterwards by changing the event attribute x-ga-gcs of a GA4 request to the value G111 (credits to Mark Edmondson):

if (isRequestMpv2()) {
    // Claim the request
    claimRequest();

    const events = extractEventsFromMpv2();
    const max = events.length - 1;
    events.forEach((event, i) => {

      // Make unconsented hits appear in GA
      const consentMode = event['x-ga-gcs'];
      if (consentMode == "G100") {
        event['x-ga-gcs'] = "G111";
      }

      ...
}

This change first ensures that a GA4 query in Consent mode continues to be visible in Google Analytics. However, the attentive reader will not have failed to notice that this does not yet provide a good solution other than an enlarged database: Each GA4 request, however, consists of a variety of attributes that are normally (without Consent mode) already collected or generated in the user's client. These include:

  • Client ID (cid)
  • Session ID (sid)
  • Session Count (sct)
  • First Visit (_fv)
  • User Engagement (seg)

Most of these parameters are also stored and retrieved client-side: In addition to the cookie with the client ID, _ga, GA4 uses another cookie called _ga_{datastream ID} that stores additional session data. Based on this data, GA4 script gtag calculates whether it is the first visit, a new session and an engaged user.

If GA4 does not set cookies, this information is regenerated on every single page view - each view appears as an interactionless first session of a new visitor. This raises the question of how this information can be obtained without relying on cookies.

To prevent each subsequent event in Google Analytics from generating a new user, a temporary identity is necessary. The Client ID marks a user identity in Google Analytics and is quite easy to determine using a hashed value. Similar to Matomo, the user agent and the IP address from the request serve as characteristics for a hash. A (tag-specific) salt turns this hash into a temporary user ID:

// Get User-Agent and IP from incoming request
const ua = getRequestHeader('user-agent');
const ip = getRemoteAddress();

// pick your own salt - can be anything
const salt = 'add a random sentence';

// Create a hashed ID from the IP and User Agent
var hash = sha256Sync(salt + ip + ua, {outputEncoding: 'hex'});

// Change the User Agent to the value from the request header and use server hash as client ID.
if(!event.ip_override) event.ip_override = "0.0.0.0";
if(!event.user_agent && ua) event.user_agent = ua;
if(hash) event.client_id = hash;

With this hash, at least distinct users can be differentiated. However, all other of the above session attributes will still be regenerated with each new call. Nor can they simply be replaced by a hash of existing information. Here, a different approach is needed to temporarily preserve this information without storing it in the user's browser. An example of such an approach is a database that is requested server-side via a REST API to determine the respective session attributes based on a reduced information set.

Add session information via database with REST API.

The architecture of the server-side tag manager allows to send requests not only to Google Analytics, Ads or Facebook, but also to your own endpoints. The prerequesite for this approach are the two API functions sendHttpGet and sendHttpRequest.

By providing an endpoint that gets the information about how far back in time a user interacted with the website and if they have been there at all, the session ID, the session counter, the first visit attribute and the user engagement attribute can be determined. This requires a database with the following fields:

  • Last Timestamp (to determine the time since the last call).
  • User ID (or server hash to distinguish different visitors)
  • Session ID
  • Session Counter

If such a database is available, this information can be stored and updated from the first call to adjust the corresponding attributes in GA4. It is recommended to use the User ID as primary key to query the database specifically for it.

Subsequently, three different states are distinguished:

  1. the user was already on the page and has an active session
  2. the user was already on the page and starts a new session
  3. the user is new on the page and starts his first session

1. The user has already been to the page and has an active session.

A session is active if the difference between the last timestamp and the current time is less than 30 minutes. In this scenario, the user already exists in the database and already has a session ID. This makes it logical that it must be an engaged user. Accordingly, the following attributes are modified from the GA4 query:

  • The parameters ga_session_id and ga_session_number get the respective value from the database.
  • The parameter x-ga-mp2-seg (Engaged Session) is changed from 0 to 1
  • The parameters ['x-ga-system_properties'].fv (First Visit) and ['x-ga-system_properties'].ss (Session Start) are removed.

After that the current timestamp in the database is updated, then the process is finished.

2. The user was already on the page and starts a new session.

If the last timestamp was longer than 30 minutes ago, this is a new session. Again, the user is already in the database. Accordingly:

  • The ga_session_id parameter is taken from the original request.
  • The parameter ga_session_number is updated with the respective value from the database.
  • The parameter ['x-ga-system_properties'].fv (First Visit) is removed.

Then the timestamp, session ID and session counter are updated in the database.

3. The user is new to the site and starts his first session.

In this scenario, no user entry exists in the database yet. Consequently, the request receives a 500 response code and ends unsuccessfully.

If this case occurs, a new request is necessary, which adds a new row to the database with the required parameters.

Code for database queries

All calculations and queries together result in the following code:

const JSON = require("JSON");
const sendHttpGet = require("sendHttpGet");
const sendHttpRequest = require('sendHttpRequest');
const setResponseBody = require("setResponseBody");
const setResponseStatus = require("setResponseStatus");
const setResponseHeader = require('setResponseHeader');
const getTimestampMillis = require('getTimestampMillis');
const makeInteger = require('makeInteger');

// User Client ID for query 
const queryCid = hash; 

// Request URL where REST API can be reached
const requestUrl = '< URL of REST API >';

// Setting current time und default parameters which be updated
const currentTime = getTimestampMillis();
let newUser = false;
let engagedUser = false;

// Preparing Response Headers for query to REST API
setResponseHeader("content-type", "application/json");
setResponseHeader("access-control-allow-credentials", "true");
setResponseHeader("access-control-allow-origin", getRequestHeader("origin"));

sendHttpGet(requestUrl, (statusCode, headers, body) => {

    var responseBody;

    if (statusCode >= 200 && statusCode < 300) {

      responseBody = body;
      const userData = JSON.parse(body);

      // Calculating time difference between last timestamp und current time.
      var timeDiff = currentTime - userData.server_timestamp;

      // If less than 30 minutes, treat as active session
      if (userData.session_id && ((currentTime - userData.timestamp) < (1000*60*30))) {

        engagedUser = true;

        event.ga_session_id = userData.session_id;
        event.ga_session_number = userData.session_number;
        event['x-ga-mp2-seg'] = "1";
        event['x-ga-system_properties'].fv = null;
        event['x-ga-system_properties'].ss = null;

        let postBody = 
              {
                timestamp: currentTime
              };
        postBody = JSON.stringify(postBody);

        // Sends a POST request and nominates response based on the response to the POST
        // request.
        sendHttpRequest(hostname + queryCid, (statusCode, headers, body) => {
          setResponseStatus(statusCode);
          setResponseBody(body);
          setResponseHeader('cache-control', headers['cache-control']);
        }, {headers: {'content-type': 'application/json'}, method: 'PUT', timeout: 500}, postBody); // headers: {'content-type': 'application/json'},

      // If more than 30 minutes, treat as new session
      } else if (userData.session_id && ((currentTime - userData.timestamp) > (1000*60*30))) {

        event['x-ga-system_properties'].fv = null;
        let newSessionId = event.ga_session_id;
        let newSct = makeInteger(userData.session_number);
        newSct += 1;
        event.ga_session_number = newSct;

        let postBody = 
              {
                session_number: newSct,
                session_id: newSessionId,
                server_timestamp: currentTime
              };
        postBody = JSON.stringify(postBody);

        // Sends a POST request and nominates response based on the response to the POST
        // request.
        sendHttpRequest(hostname + queryCid, (statusCode, headers, body) => {
          setResponseStatus(statusCode);
          setResponseBody(body);
          setResponseHeader('cache-control', headers['cache-control']);
        }, {headers: {'content-type': 'application/json', "access-control-allow-credentials": "true"}, method: 'PUT', timeout: 500}, postBody); 

        } 

      }

      setResponseStatus(200);

    } else {

      // If response is 500, quit and change to new user

      responseBody = "{}";

      newUser = true;

      setResponseStatus(500);

    }

setResponseBody(responseBody);

if (newUser) {

  let newSessionId = event.ga_session_id;

  let postBody = 
      {
        cid: hash,
        session_id: newSessionId,
        session_number: 1,
        name: "",
        server_timestamp: currentTime
      };
  postBody = JSON.stringify(postBody);

  // Sends a POST request and nominates response based on the response to the POST
  // request.
  sendHttpRequest(hostname, (statusCode, headers, body) => {
    setResponseStatus(statusCode);
    setResponseBody(body);
    setResponseHeader('cache-control', headers['cache-control']);
  }, {headers: {'content-type': 'application/json', "access-control-allow-credentials": "true"}, method: 'POST', timeout: 500}, postBody); 
}

Within a client template for the server-side tag manager, this routine modifies the GA4 parameters of a cookieless tracking to a coherent GA4 request that serve as a qualitative interaction within the GA4 interface.

Summary

This setup lays the groundwork for a cookieless tracking with GA4: A GA4 tag is triggered on the client-side which is then processed and anonymized by a custom client template on the server. These settings are initially sufficient to perform a session-based measurement of website visitors in Google Analytics. In this process, no more user data is collected than is absolutely necessary to obtain a temporary user and session identity along with their GA4 parameters about engagement and number of previous sessions or first visit. At the same time, personally identifiable information such as IP address is removed from the request, making subsequent tracking or profiling impossible.

If this setup is extended accordingly, other exciting use cases are conceivable. For example, the server-side queryable database makes it possible to enrich the data with additional information that was previously stored there. In this way, a system can be implemented on one's own, which otherwise could only be achieved by tools with costs such as Snowplow or Segment.

Share post: Copied!

Weitere Posts zu diesem Thema