Data web marketing and the law

In the future of online marketing, the biggest internal tug-of-war might not be between the marketing department and IT — I maintain the belief that marketing will subsume technology under its own management umbrella — so much as it will be between the marketing department and legal.

As we enter the next era of the web — Web 3.0 or the web of data or the semantic web, whatever you prefer to call it — the big debate is going to be about legal, policy, and intellectual property rights around data.

How much of our data should we share? With whom?

Who will we allow to use that data, under what circumstances? How will we enforce those policies?

Do we actually own the data we think we do? Or do we have obligations to the source of that data, such as aggregate analysis of customer behavior, for which we have limitations in how we can use it?

What about using other people’s data? Do we need explicit permission? What constraints do we have in reuse or redistribution? What if the provider’s policies change?

Since data web marketing is all about sharing your data and leveraging other people’s data — open, linked data as Tim Berners-Lee calls it — as a whole new kind of marketing medium, these questions can become significant hurdles.

Ultimately the real question should be: how do we extract the maximum value from our data, by sharing it, or not sharing it? Within a company’s huge data universe, there are probably multiple answers along that continuum, depending on which particular data you’re talking about — and how its use relates to the business model and marketing strategy.

Copyright and data reuse

There’s an interesting article in the latest MIT Sloan Management Review, Finding New Uses For Information (sorry, registration required) by Hongwei Zhu and Stuart E. Madnick, that tackles some of these legal questions from both sides: data owners and data reusers.

How can someone own data and control its use if the data is openly accessible via the Web? What is the best strategy for those who think they own the data? And what is the best strategy for those who want to reuse data that is available via the Web?

Without any doubt, data can be an important asset of a business. The business “owns” the data when it can fully control who can access the information and how it should be used. But when a company makes data accessible to the public on the Web, its “ownership” to that data will be determined by intellectual property law.

The article mentions that of the 4 kinds of commonly recognized intellectual property rights — trade secrets, trademarks, patents, and copyright — only copyright is even conceivable as the basis for legal protection in this context.

Unfortunately, at least in the U.S., the applicability of copyright protection to databases is somewhat murky. One relevant Supreme Court case, Feist v. Rural, emphasized that copyright only protects originality. “Information” is not copyrightable, but “collections” of information can be — if the collection represents some minimal degree of creativity. Individual pieces of factual data are not protected.

The European Union has slightly stronger protections in place with their Database Protection Directive, which grants special rights to the creators of databases, to “protect the qualitatively or quantitatively substantial investment in either the obtaining, verification, or presentation of the contents”.

Given the fuzziness of copyright protection in this context, Zhu and Madnick recommend the following strategies — really more business strategies than legal strategies — for database creators:

Sell “private” data. Have a portion of your data that is publicly available on the web that anyone — including your competitors — can freely use and redistribute. But then also have more advanced or detailed data that’s related to that public data, but is controlled privately and only given to customers under specific terms.
Become a reuser. If you can’t beat them, join them. Essentially, this says that the value a business extracts from data has to shift to being the application in which it is used and related to other data — more than from the raw data itself.

For data reusers, they advise:

Differentiate. Since copyright law in the U.S. tends to reward originality and creativity, finding new ways to leverage the sourced data — particularly a partial subset of data, not a wholesale copy — and add value is a good strategy. Aside from pleasing the lawyers, this is also likely to be a good marketing strategy to give users something better that they can’t get quite the same way anywhere else.
Analyze the data. Sometimes the greatest value can be provided by crunching the data from multiple sources and then only sharing the net results — not the underlying data that led to those results.

Terms of service and conditional access

While I liked Zhu and Madnick’s article — always better in my book to think about innovation instead of case law if you can help it — I felt that they overlooked the 800-pound gorilla in how data restrictions are predominantly enforced on the web today: terms of service and conditional access.

One of the examples given in the article was a case some years back when eBay sued Bidder’s Edge — an auction aggregator that has since gone out of business. Bidder’s Edge was scraping eBay with web crawlers to grab the data about their auctions. eBay didn’t like this and successfully won a preliminary injunction against Bidder’s Edge preventing them from doing that.

Although copyright infringement was one of the arguments, the injunction was actually granted around a separate argument, around the claim of “trespass”. The judge wrote, “eBay’s servers are private property, conditional access to which eBay grants the public.” The concern was framed in terms of these automated spiders causing undue loading on eBay’s servers. A good recount of the case can be found in the Computerworld article When does ‘spidering’ equal trespass?

But even that debate seems somewhat moot in the modern world of data access over the web where you need to agree to the terms of service for a site before you can even see the data.

Just the other day, there was an article on TechCrunch about Amazon Killing Mobile Apps That Use Its Data. The new Delicious Library iPhone app — which was supposedly very cool — used Amazon’s Product Advertising API for some of its content. However, Amazon’s terms of service were recently updated to state:

You will not, without our express prior written approval requested via this link […] use any Product Advertising Content on or in connection with any site or application designed or intended for use with a mobile phone or other handheld device.

So Amazon simply said: shut it down, or we’ll shut you down.

Aside from the legal channels Amazon or other such sites could take in enforcing their terms and conditions, the most direct way for them to handle it is to shut down the account or block the IP address of the application pulling the data.

It’s not just Amazon. Almost every major web site — Facebook, Twitter, numerous Google applications and APIs — have terms of service that they can use to throttle data reuse. Given that APIs are often provided specifically to enable data reuse, getting into the weeds of what constitutes permissible reuse could get quite messy.

And even if you find a path through the thorny underbrush, what happens if those terms of use are changed overnight?

Well, as I said at the beginning of this article — although with no joy in saying it — I think the legal department may have a bigger role in the future of online marketing than either of those departments would like.

Get chiefmartec in your inbox

Join 42,000+ marketers and martech professionals who get my latest insights and analysis.