In the Data Practitioner Spotlight series, we interview people who are directly working at the forefront of data across a range of sectors. In this edition, we sit down with Laura Dawson, who heads up data quality at marketing analytics startup EDO.
Laura Dawson is the head of data quality at EDO. EDO is a marketing analytics startup that helps marketers determine the effectiveness of their television ad buys. They do this by cross-referencing the ads against search traffic for the brands in the ads – if an ad for Porsche is shown on TV and at the same time, searches for Porsche jump a lot, then the ad was probably effective.
The EDO product, then, is about accurately integrating two primary sources of data, TV streams and Google trends data, and then making that merged data available to marketers so they can make better ad-buying decisions.
What EDO's data team does
EDO has collection infrastructure that pulls in TV streams. The individual ads in the streams are then tagged with:
Is a promotion being run for the product
So for example, this ad would be tagged with:
Product Variation: Echo
Promotion: 20% off on Amazon.com
The data is the ultimate product and has to be accurate. Tagging is done manually, through an outsourced team aided by automated tools. (This is a similar model to startups like Scale.ai in the self-driving car data generation space.) The annotator has a dropdown that the taggers select. If they believe the correct tag is not already in the dropdown, they add a new one which is reviewed the next day.
Initial tagging occurs during the US nighttime. When the US team wakes up, Dawson’s team normalizes those tags into a data taxonomy and deals with governance issues. This process is often very subjective.
“We’re dealing with questions like, is Amazon an umbrella brand, or are we going to break out Amazon Prime video as a separate brand?" says Dawson. Sometimes, these decisions are guided by the brand (if the brand is a client). Other times, the data quality team makes a judgment call. “We’ll go to their website and see how they're organizing their products and look at how they’ve discussed their brand over time historically,” Dawson says.
In addition to maintaining the data taxonomy and the correct labels, Dawson’s team is also in charge of tasks like:
Normalizing video quality: For example, two pieces of video might actually be semantically the same even if there are small differences in audio or video quality.
Responding to changes in Google Trends data: The final output of the data quality team at EDO is a dashboard used by the analytics team, coupled with custom reports that they run. The company also licenses the taxonomy developed and distributes that via s3 bucket.
What is a data taxonomy?
Dawson’s work on building a robust data taxonomy for EDO’s data brings up a question. What is a data taxonomy, and why is it important?
A data taxonomy refers to how tracked events and properties are named and categorized. Data taxonomy ensures that data is consistently categorized across multiple sources and channels, so that consumers of the data can derive meaningful insights.
Let’s say that you’re merging two tables: one where the purchase event is denoted as “Checkout Submitted Order” and another where it’s denoted “Checkout submitted order.” These events will be considered two separate events and will not automatically merge. Therefore, if you query for submitted orders, you’ll probably get an inaccurate result.
Data taxonomy originated as a subject in the library sciences, used to figure out how to best categorize and name books. It eventually broadened beyond the library sciences into data at large.
Data taxonomy in e-commerce
The earliest applications of internet data taxonomies happened in the e-commerce space. Online marketplaces like Amazon had to organize their product catalog in such a way that consumers would actually find the stuff they wanted.
In a recent blog post, for example, Etsy outlined their product taxonomy: a collection of hierarchies “comprising of 6,000+ categories (ex. Boots), 400+ attributes (ex. Women’s shoe size), 3,500+ values (ex. 7.5), and 90+ scales (ex. US/Canada).” These hierarchies form the foundation for the various filters and category-specific shopping experiences that make up the buyer experience.
Video and content taxonomy
Prior to EDO, Dawson spent time as a taxonomy analyst at HBO. There, along with her boss, she pioneered a new standard for language metadata called IETF BCP 47 (Internet Engineering Task Force Best Common Practices).
Previously, different departments coded the Spanish language differently, including uppercase Spanish, lowercase Spanish, and other variations to represent specific dialects.
By creating the language metadata standards, Dawson created a source of truth across the company,. The streamlined the language metadata terminology for audio, subtitles, closed captions, rights and licensing.
Principles of building a data taxonomy
Building and maintaining data taxonomies is probably one of the most labor-intensive approaches to high-quality data, and EDO does it because data is the product. Below, Dawson shared some of her hard-won principles of exporting data to its end users.
1. Think about who the data is for
Is it for a manufacturer, a distributor, or a consumer? “Back when Amazon first started up, they were using backend data from publishing warehouses that was really junky,” says Dawson. But consumers have different expectations. “There was this whole educational effort back in the late 90s and early 2000s to make that data more palatable for consumers.”
2. Understand the constraints of your system
You might have certain engineering constraints or database constraints. How will you bend the taxonomy to make that work? EDO uses a three-tier taxonomy - brands, products, product variations. In the case of a product variation, what do teams do? Says Dawson, "We don't have a fourth level. We have to figure out a way to set up the product variation field to concatenate all of these different spinoffs."
3. Don’t make your taxonomy too deep
“If your taxonomy is too layered, if it goes too deep, you're going to have a nightmare in terms of organization and monitoring that data," says Dawson. "For us, three layers just seemed to be the level at which our clients responded well to it AND our reviewers were able to work with it.”
4. Engage a customer success team around the most-used data
Dawson tells us, “The more eyes are on the data, the more you need a dedicated person or team to react quickly to the inevitable incoming feedback, like when a brand should capitalized and it’s not.”
5. If you’re managing a taxonomy, you will always be grooming it
A data taxonomy is not a set-it-once kind of thing. It’s a constant, iterative process. Dawson says, “You're always looking for ‘can we collapse these?’, ‘do we have to expand these?'” It's a living set of rules that is subject to change as you acquire more information.
As a marketing analytics company, EDO’s product is data. The standards it sets for its data quality and the processes it has for maintaining that quality, offer valuable lessons for data teams that are mostly exporting data for internal usage.