Google Analytics 4 BigQuery (Service Account) Extractor
What is Google Analytics 4 Extractor?
Google Analytics 4 is an analytics service to measure traffic and engagement across your websites and apps. It collects event-based data from both websites and apps.
Google Analytics 4 BigQuery (Service Account) extractor by Electrik.AI is designed to export Google Analytics 4 property data from your linked BigQuery dataset and load it into your database. You no longer need to manually export GA4 property data to database niether do you need to bother about unnesting the property data or about merging the multiple events_YYYYMMDD tables.
With Electrik.AI’s Google Analytics 4 BigQuery (Service Account) Extractor, you can get daily report in the database of your choice. Electrik.AI makes it easy to get the data you want so that you can focus on analysis and reporting.
Select Export Type (Daily or Streaming) as set up in your Google Analytics 4 BigQuery Links setting.
Step 7: Click the Save button on the top right corner of the screen.
Congratulations! You have successfully created a Google Analytics 4 BigQuery (Service Account) connection on Electrik.AI.
What are the supported Data Destinations?
Electrik.AI supports the following databases and Data Warehouse to export Google Analytics Data. It is important to note that Electrik.AI itself is not a destination. A destination is required to export data from various marketing data sources using Electrik.AI
How to export Google Analytics 4 Data?
Electrik.AI creates a data flow (or ETL process) that exports Google Analytics 4 property data to a target database or data warehouse or cloud storage. This process runs daily at scheduled time and exports data for the previous day.
Follow these steps to create a Data Flow from Google Analytics 4 BigQuery (Service Account) to Google BigQuery
Step 1: Ensure you have the following prerequisites.
Access to Google BigQuery Project (the one Linked to GA4 property)
Access to a Database of your choice (PostgreSQL, MySQL, Azure SQL)
Database Connection Details such as Host, Port, DB Name, DB User.
Step 2:Sign-in into your Electrik.AI account. Step 3: Select IO in Electrik.AI dashboard
Step 4: Select the Connections Icon in the left most panel and Click on Add button to create a Google Analytics 4 BigQuery (Service Account) Data Extractor Flow.
Step 5: Select Google Analytics 4 BigQuery (Service Account) Connection present in the list of available sources.
Step 6: Select your existing Google Analytics 4 BigQuery (Service Account) connection in Electrik.AI.
To configure a new connection, refer to this link.
Step 7: Click on a destination of your choice from list of available destinations.
In this example we are choosing PostgreSQL.
Step 8: Click on a destination of your choice from list of available destinations.
Step 9: Review the Data Source, Data Destination, Database Schema, Provide Success Notification Email, Failure Notification Email. Click “Save and Proceed” to continue.
Success Notification Email: Enter a valid Email ID to receive the Flow Success Email Notification. Failure Notification Email: Enter a valid Email ID to receive the Flow Failure Email Notification. Note: By Default, Electrik.AI uses the signed-in user’s Email Id.
Congratulations, Flow is created. Please wait while you are redirected to Flow Details Screen.
You have now successfully setup Google Analytics 4 BigQuery (Service Account) to PostgreSQL flow in Electrik.AI
You have now successfully setup Google Analytics 4 BigQuery (Service Account) to PostgreSQL flow in Electrik.AI
How to find your account_id in Google Analytics
Step 1: Login to your Google Analytics account.
Step 2: Click on ‘Admin’, in lower left corner of your screen.
Step 3: Click on ‘Account Settings’
Step 4: Copy the “Account Id”
Sample Queries To Get Google Analytics 4 Data
1. Gives event param key and values for a specific event name.
SELECT
event_date,
event_name,
EP.key AS param_key,
COALESCE(value.string_value,CAST(value.int_value AS STRING), CAST(value.float_value AS STRING),
CAST(value.double_value AS STRING)) AS param_value
FROM
`your_project_id.your_dataset_id.events_*`, -- Replace table name.
UNNEST(event_params) AS EP
WHERE
event_name = 'user_engagement' -- Provide required event_name.
AND _TABLE_SUFFIX BETWEEN '20230801' AND '20230831'; -- Replace date range.
2. Gives all available event parameters and counts of their occurrences in your dataset.
SELECT
EP.key AS event_param_key,
COUNT(*) AS occurrences
FROM
`your_project_id.your_dataset_id.events_*`, -- Replace table name.
UNNEST(event_params) AS EP
WHERE
_TABLE_SUFFIX BETWEEN '20230801' AND '20230802' -- Replace date range.
GROUP BY
event_param_key
ORDER BY
event_param_key ASC;
3. Gives event params values for all the available event param keys in your dataset.
SELECT distinct key,
COALESCE(value.string_value,CAST(value.int_value AS STRING),
CAST(value.float_value AS STRING), CAST(value.double_value AS STRING))
AS param_value
FROM
`your_project_id.your_dataset_id.your_table_id`, -- Replace table name.
UNNEST(event_params) AS event_params
-- Use this to filter for a specific key and provide the required key
-- WHERE key in ('ga_session_id','ga_session_number','session_engaged','campaign','medium')
ORDER BY key;
4. Gives list of items.
SELECT event_date, event_timestamp, event_name, user_pseudo_id,items.*
FROM `your_project_id.your_dataset_id.your_table_id`, -- Replace table name.
UNNEST(items) AS items
ORDER BY 3;
5. Gives list of user properties.
SELECT event_date, event_timestamp, event_name, user_pseudo_id,
key,
COALESCE(value.string_value,CAST(value.int_value AS STRING), CAST(value.float_value AS STRING), CAST(value.double_value AS STRING)) AS param_value
FROM `your_project_id.your_dataset_id.your_table_id`, -- Replace table name.
UNNEST(user_properties) AS uprop
ORDER BY 1;
6. Gives unique events counts by a specifc period of days and by selected event name.
SELECT
event_date,
event_name,
COUNT(*) AS event_count -- Counts unique events by date and by event name for a specifc period of days
FROM `your_project_id.your_dataset_id.events_*` -- Replace table name.
WHERE
event_name IN ('page_view', 'session_start', 'purchase') --PASS enent_names here
AND _TABLE_SUFFIX BETWEEN '20230701' AND '20230708' -- Replace date range TO get data for specific date range
GROUP BY 1, 2;
7. Gives day wise user count for specific date range.
SELECT _TABLE_SUFFIX AS event_date,count(distinct user_pseudo_id) AS date_wise_user_count
FROM `your_project_id.your_dataset_id.events_*` -- Replace table name.
WHERE _TABLE_SUFFIX BETWEEN '20230717' AND '20230722' -- Replace date range TO get data for specific date range
GROUP BY _TABLE_SUFFIX
ORDER BY 1;
8. Gives total user count and new user count for specific date range.
WITH
new_users AS (
SELECT
user_pseudo_id,
MAX(IF(event_name IN ('first_visit', 'first_open'), 1, 0)) AS is_new_user
-- Replace table name.
FROM `your_project_id.your_dataset_id.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20230701' AND '20230708' -- Replace date range TO get data for specific date range
GROUP BY 1
)
SELECT
COUNT(*) AS total_user_count,
SUM(is_new_user) AS new_user_count
FROM new_users;
--
-- Queries the individual timestamps and values for all 'purchase' events.
SELECT
event_timestamp,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'value'
) AS event_value
FROM
-- Replace table name.
`your_project_id.your_dataset_id.events_*`
WHERE
event_name = 'purchase'
-- Replace date range.
AND _TABLE_SUFFIX BETWEEN '20230801' AND '20230802';
9. Gives the latest ga_session_id and ga_session_number for specific users during last ‘N’ days.
DECLARE N INT64 DEFAULT -3 ; -- Replace 3 BY LAST NUM OF DAYS REQUIRED
DECLARE REPORTING_TIMEZONE STRING DEFAULT 'America/Los_Angeles'; --Replace America/Los_Angeles BY REQUIRED TIMEZONE
CREATE TEMP FUNCTION GetParamValue(params ANY TYPE, target_key STRING)
AS (
(SELECT `value` FROM UNNEST(params) WHERE key = target_key LIMIT 1)
);
CREATE TEMP FUNCTION GetDateSuffix(date_shift INT64, timezone STRING)
AS (
(SELECT FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(timezone), INTERVAL date_shift DAY)))
);
WITH
userpseudoid AS(
SELECT
DISTINCT user_pseudo_id
FROM
`your_project_id.your_dataset_id.events_*`,
UNNEST(event_params) AS event_params
WHERE KEY IN ('ga_session_id','ga_session_number')
AND _TABLE_SUFFIX BETWEEN '20230517' AND '20230822'
)
SELECT DISTINCT
events.user_pseudo_id,
FIRST_VALUE(GetParamValue(event_params, 'ga_session_id').int_value)
OVER (UserWindow) AS ga_session_id,
FIRST_VALUE(GetParamValue(event_params, 'ga_session_number').int_value)
OVER (UserWindow) AS ga_session_number
FROM
-- Replace table name.
`your_project_id.your_dataset_id.events_*` events
INNER JOIN userpseudoid AS upid
ON(upid.user_pseudo_id = events.user_pseudo_id)
WHERE
RIGHT(_TABLE_SUFFIX, 8)
BETWEEN GetDateSuffix(N, REPORTING_TIMEZONE)
AND GetDateSuffix(0, REPORTING_TIMEZONE)
WINDOW UserWindow AS (PARTITION BY events.user_pseudo_id ORDER BY events.event_timestamp DESC)
ORDER BY 1;