THE HACKIES: DIY marketing data warehouse recipe

DIY marketing data warehouse

The MarTech Hackies 2017

This article is a guest post by Alex Sirota of NewPath Consulting. It was entered into The Hackies essay contest for the upcoming MarTech conference. Like it? You can register your vote in the contest by sharing it on social media, especially LinkedIn, Facebook, and Twitter.

Let’s use Amazon Web Services Relational Database Service (AWS RDS) to create a cloud based relational database system!

AWS RDS is a cloud platform that makes it easier to set up, operate, and scale a relational database in the cloud. AWS RDS provides cost-efficient, resizeable capacity for an industry-standard relational database and manages common database administration tasks.

AWS RDS currently supports these databases:

  • Oracle on Amazon RDS
  • MySQL on Amazon RDS
  • Microsoft SQL Server on Amazon RDS
  • PostgreSQL on Amazon RDS
  • Aurora on Amazon RDS
  • MariaDB on Amazon RDS

For large data warehousing needs, Amazon Redshift is also available and supported by Fivetran. Here’s a tour of Fivetran.

Below is a recipe for getting going with your own data warehouse to start building powerful business dashboards with tools like Looker.

Ingredients:

  1. Amazon Web Services RDS — free for the first 12 months (certain restrictions apply, see the RDS pricing page).
  2. Setup a trial account at Fivetran.com.
  3. Install pgAdmin or SQL Workbench/J (with JDBC drivers for your operating system).
  4. Need some working knowledge of SQL to grant permissions on your data warehouse to Fivetran.

Recipe:

  1. Setup an Amazon RDS instance using PostgreSQL. Amazon Redshift and Snowflake are higher-end enterprise tools that are also potential data warehouses, but we haven’t tried them out yet. You can use your amazon.com account to login to AWS and setup PostgreSQL on AWS RDS.
  2. Once AWS RDS is setup you will be able to connect to the instance from your machine only (the default access control for a new AWS RDS).
  3. Modify access control groups for AWS RDS to enable access for Fivetran to access your database.
  4. Create necessary user for Fivetran using your administrative account and grant privileges for your database to the Fivetran account (SQL). Make sure you commit your changes!
  5. Setup data sources in Fivetran, authenticate and start the data pipeline (Google Analytics and Stripe are interesting ones to start with).
    1. You will need to setup a schema (a collection of tables).
    2. You will need to specify which target table will be populated.
  6. Use your SQL Workbench or pgAdmin to verify tables are being created in your data warehouse.
  7. The data pipeline will continue to populate data as new data is created in the data sources. Any changes in the schema will be reflected in your data warehouse but no data will ever be deleted. Changes will be marked in the target tables.

Time to Bake:

Probably 3-5 hours depending on how quickly you can click the UIs!

Alternate Prep Techniques:

The functionality of tools like Zapier should not be confused with tools like Fivetran. Zapier is great at integrating two or more cloud apps that do not have built in integrations.

So, for example, using Zapier you can have a form filled out in Formstack, trigger an event in a Slack channel. Formstack does not support integration with Slack, but Zapier bridges this gap so that data can trigger an event in a remote cloud app. Zapier is sort of a “crazy glue” to connect disparate cloud apps that don’t have a native integration.

Have fun integrating data and building data pipelines!

What did you think of this article as an entry in The Hackies essay contest for the upcoming MarTech conference? If you liked it, you can register your vote in the contest by sharing it on LinkedIn, Facebook, and Twitter.

Have a marketing/technology/management “hack” that you want to share with the world? Consider entering The Hackies yourself — we’d love to learn from your experience and insight!

Share

Comments

  1. This is a great starter article. Wouldn’t underestimate the work needed to develop a good tracking plan and to ensure you’re pushing the right data from across your customer journey into your data warehouse. in my experience, tools like Segment.com are also very beneficial at establishing a reliable source of canonical customer data.

    • Totally agreed. This was just meant as a kick start to check out the state of the art tools and services that have come to the market in the last 2 years. The solutions are transforming the state of the art due to the speed of provisioning and deployment capabilities.

      I will definitely check out segment.com and hopefully look to the community to comment on other tools and services that can act as more ingredients to make a more tasty dish.

  2. One more alternative set of ingredients I have been experimenting with which is even cheaper for dev purposes. It’s actually FREE for sotring less than 10,000 rows, forever.

    1. Setup a Heroku Postgres database at https://www.heroku.com/postgres. The free plan stores up to 10,000 rows collectively from all your schemas.

    2. Setup the Stitch data pipeline over at stitchdata.com – it’s free for 5M rows, plenty for testing multiple integrations.

    3. Select an integration from Google Analytics and try out a batch of dimensions and metrics. Try a Webhook from your favourite web app. Webhooks are a free Stitch integration.

    4. Check out singer.io — an open source library for creating “taps” and “targets” to connect SaaS data sources to a database or even Google Sheets.

    An explosion of data pipeline options!

Leave a Reply