The rise of the marketing data lake

The Marketing Data Lake

The following is a guest post by Doug Kessler, co-founder and creative director of the B2B marketing agency Velocity. They worked closely with Informatica, one of their clients, to produce educational content on marketing data lakes, including a book by Franz Aman and Anish Jariwala. It’s some of the best writing on the topic that I’ve come across.

It’s way too late to call the rise of the marketing operations function. But as the role matures, we’re starting to see more and more ops people spending more and more of their time on one thing: data wrangling.

A recent long-form article by Informatica, What Is Marketing Operations?, gives a thorough update on how the role is changing. And it’s all about data. Data integration between cloud and on-premise applications (or cloud-to-cloud integrations); data quality; master data management…

You’d expect Informatica to feel this way, since they make data management software. But even three years ago, it was a rare marketing department where you’d hear conversations about this kind of thing. Today, it’s a rare one that isn’t talking about them.

Having gone wild on all the things that marketing apps let us do, we’ve all now hit the wall. Because each of these apps generates its own Mississippi of data. And each tags and structures that data in its own special way (including a unique convention for customer IDs). So when it’s time to bring all those data puddles together — for analysis, modelling, segmenting, whatever — it’s an unholy mess.

The number one imperative for marketing ops

This is the single most important imperative for every marketing operations team: to fight against data fragmentation and find ways to unite the whole marketing stack at the data layer.

The traditional way to do that was to extract, transform, and load (ETL) everything into a marketing data warehouse. And that still works for lots of use cases.

But data warehouses were invented way before things like cloud apps and social media feeds. So instead of being optimized for the kinds of data that marketers deal with every day (high-volume, un-structured, multi-sourced), they’re optimized for fewer sources of data that are pretty well-structured.

Because of this, marketing operations teams often find warehouses expensive, hard to work with, and a nightmare to change (to add new data sources, for instance).

Just when we thought we’d escaped the clutches of IT, the data warehouse forced us back into the IT queue to do even simple things like run new reports. And skyrocketing data volumes cause costs to skyrocket too.

Enter the marketing data lake

Here’s where the data lake comes in. Specifically, the marketing data lake.

A data lake is a data store, just like a warehouse, but with some important differences:

DATA WAREHOUSE DATA LAKE
Data types Structured Unstructured or multi-structured
Database schema Schema-on-write Schema-on-read
Cost Expensive storage Low-cost storage
Ideal for Penny-perfect, super-secure financial reporting Agile marketing analytics and decision-making
Agility Difficult to add new reports and queries Easy to add new reports and queries

Source: The Marketing Data Lake by Franz Aman and Anish Jariwala

Data lakes have no problem with unstructured or multi-structured data. That’s what they’re for. Just dump in any source and worry about it later. They run on low-cost storage (usually in the cloud, like in AWS) so you can keep more data.

The big news about data lakes, though, is that line about database schema — the way a database structures its data. A data warehouse is “schema-on-write,” meaning you need to know how you want to structure your data before you build and load the warehouse. It’s a fundamental decision that you need to make right up front. Changing it sucks.

A data lake is “schema-on-read,” which means you worry about structuring the data only when you’re making a query, running a report or building some kind of new dashboard or use case.

That small-sounding difference is actually huge for marketers. Because it means we can capture more data from more sources — even before we know how we might use that data.

Garbage In, Garbage Out (GIGO)

Of course, it still pays to put clean, high-quality data into your data lake (or you’ll spend 80% of your time on housekeeping later instead of actually putting the data to work). The need for good data governance doesn’t go away with data lakes. If anything, it’s even more important.

So it helps to standardize important things like campaign codes and consistent definitions of things like “customer,” “sales region,” and “product” before you go too far. But with basic governance in place, a marketing data lake will do all kinds of good things for your marketing operations:

  • Make it easier to combine data sources for easy analysis — like syncing your web analytics with your marketing automation or CRM or email performance data or…
  • Give you the foundation for that elusive “single view of the customer” – the holy grail of marketing.
  • Stay agile — spinning up new use cases and adding new data sources quickly and easily.
  • Make your data more reliable — so your models are better and your predictive analytics actually predict.

Bottom line? A marketing data lake lets you really take control of your own marketing data instead of holding it hostage inside all the apps you use.

Make nice with IT

That crack about “escaping the clutches of IT” — I was just kidding. The new world of data-driven marketing operations demands a close relationship with the folks over in IT.

Even if you knew how, would you really want to spend your time spinning up a Hadoop cluster on AWS, building your data lake, and integrating all the different data sources you’ll be working with?

This is the stuff modern IT folks are great at. And, as long as you work closely with them, they’ll get you there much, much faster and in far better shape.

So go make friends with IT. Talk to them about a marketing data lake. And become the master of your own marketing data.

Thanks, Doug!

Share

Leave a Reply