Google Analytics to Data Warehouse
What is Google Analytics ?
Google Analytics is a cloud-based web analytics platform provided by Google. Google Analytics enables you to track data on your website through dimensions that allow you to sort through your website’s visitors and metrics that monitor website activity, among other features.
Exporting data collected by Google Analytics into a Data Warehouse gives you the opportunity to blend with other marketing data sources, thus enabling you to gain profound insights on your business.
How to export data from Google Analytics to Data Warehouse?
Google Analytics lets you export data using APIs. In particular, the Core Reporting API allows you to access dimensions and metrics for your chosen reporting view from outside of the GA interface.
Exporting data using Google Analytics API involves writing the script to identify/access the data, extract the data, transform the data, create a repository in your data warehouse, and finally load the data in the data warehouse.
You will need an understanding of how APIs work, a Google Analytics account, and a Data Warehouse of your choice. The steps involved in loading the data from Google Analytics to a data warehouse are as follows:
Step 1: Identify Your Data
The first step is to identify/access the data in Google Analytics. You can do this through the Google Analytics API. Google Analytics provides a rich API that exposes several endpoints with which you can programmatically interact. The data from Google Analytics is in the form of reports which can be narrowed down to a specific time period of your choice.
Step 2: Extract Your Data
You can use the API to extract data after identifying the data and timelines you want to see. The dashboards and reports you generate with the API can also be used in your Google Analytics account, in addition to exporting to your data warehouse.
Step 3: Transform and Prepare Your Data
You must first transform your data to ensure that it is in a format that can be accepted by your data warehouse. For example, it will be easy to use a JSON format for Google BigQuery but you may have to choose to convert to a CSV or SQL format for more traditional relational databases like Microsoft SQL Server. You also need to ensure that the data types in Google Analytics map to the data types of your chosen data warehouse.
Step 4: Create a Data Receiving Repository In Your Data Warehouse
Creating a data stage for your data could make your data transformation easier to perform before it is finally ingested for analysis/reporting. This is easy to create in data warehouses like Google BigQuery or Snowflake.
Step 5: Load Your Data
It is advisable to design a schema for your chosen data warehouse and then map it to your Google Analytics data. In this way, you are almost ready to load your data from Google Analytics to a data warehouse after making sure that all the steps are completed to suit your needs. The specifics of this step depend on your chosen data warehouse.
For example, in Snowflake you can use the COPY INTO SQL command. Alternatively, you might have to use a command-line tool in other data warehouses like Google BigQuery.
Limitations of Google Analytics Data exported using APIs.
Google Analytics only allows you to connect to the reporting layer of their API, so you cannot access all the raw data captured by Google Analytics. Instead, Google requires that it must be queried via a predefined report, but this prevents you from getting deeper into the data and exploring it from all sides.
These are the key limitations of Google Analytics Data exported via Google Analytics API.
Data Sampling: Even Google’s processing servers can’t always handle endlessly-large volumes of data in a finite amount of time. Thus Google Analytics applies sampling when you request a large amount of data.
This data sampling is different from the limit of 10 million records for hits or events, per property, per month.
Aggregated Data: Google Analytics API does not provide access to hit level data, so you’re storing the sampled, pre-formatted data that’s returned to you and that you can view through the Google Analytics user interface.
Fragmented Data: Google Analytics limits the number of dimensions and metrics you can include in a Google Analytics API request. Not every metric can be combined with every dimension. Each dimension and metric has a scope: user-level, session-level, or hit-level.
Fortunately, most of these problems can be solved with raw data. Pretty cool, huh? Let’s figure out how to get raw data.