The observatory

October 31, 2022

Ethan Aaron, Portable

Ethan Aaron, founder of Portable, and Kyle Kirwan, CEO and co-founder of Bigeye, discuss the MVP modern data stack, the data tracks controversy, and two trends in data you should be leveraging right now.

Read on for a lightly edited version of the transcript.

Kyle: Hey, everyone, welcome back to The Observatory. I'm your host, Kyle Kerwin at Bigeye. And today we're speaking with Ethan Aaron. Ethan is the CEO and founder of Portable, which helps data teams connect the longtail business applications which might not normally have an ELT connector available for them to their data warehouses, all with no code.

Previously, Ethan has held roles in BI teams, M&A, and strategy, hee's been a product manager… but he's also an expert at community building. And he's been on the front lines of a number of conversations going on right now about the modern data stack, tooling, and what's going on in data in general. So Ethan, welcome to the show. We're really excited to have you on today.

Ethan: Excited to be here. The conversation should be fun.

Kyle: So we've got a few questions for you today. And then, of course, the rapid fire at the end. To kick us off: you share a lot of thinking kind of very publicly with the community. I'm sure plenty of folks have seen you posting on LinkedIn and things like that.

The modern data stack is a pretty common theme in the conversations that you have with people in public. How would you explain the modern data stack? And specifically, if you were to recommend one to a new data team. How would you explain what it even is? And what would be your recommendation for a modern data stack.

Ethan: Totally. Let me start by saying the term “modern data stack” is overloaded at this point. Five years ago, it was a lot simpler. The modern data stack was an ELT tool, extract some data from your business applications, put it into a data warehouse, and build a dashboard. Goal number one was just to turn raw data from systems into insights to better make strategic decisions. It was simple, easy to understand, and modern for the time.

Today, it's a lot more complicated. There are hundreds of different tools. There are various levels of complexity. For a small startup, you might need two tools for your data stack. For a large enterprise, you might need 100. You probably don't need 100, but you might.

To me, when I think about data stacks and setting up data teams, the first question I always ask is: what is the objective? What are you actually trying to accomplish from a business perspective? If you are creating a data team, and the goal is to build five dashboards for your CEO to better run the business - that's the goal. What's the MVP way of doing that?

You might need a data warehouse. You might not, if all your data lives in a database, you could probably just query directly. Maybe you need data from business applications. Maybe you use Zendesk or MailChimp or HubSpot or Salesforce. In that case, you definitely need a data warehouse. And you need an ELT tool to get the data into the warehouse. And you need a visualization tool to query it and build analytics.

If you're a big enterprise, and you're trying to build data products for customers or internal consumers, then data observability is critical. Real-time analytics are most likely valuable for you. You want embedded visualization capabilities.

So when I think about data stacks and data teams, it's a journey. Where are you on the journey? Are you literally starting and you just need a dashboard? Great. Don't buy third-party tools.

Are you an enterprise and you're trying to productionize things that started off as few dashboards and now they're a mess? At that point, you need more tools, you need more architecture, more data models, data contracts, that type of stuff.

So there are scenarios where people need everything, but for most teams, all you need is a data visualization tool, a data warehouse, and a way of getting data into it to prove value. From there, what's the next investment? And a lot of that just depends on the company and the use cases and the best way to create value for that enterprise.

Kyle: It sounds like you're a big proponent of simplicity, which I'm sure many people would appreciate. I think folks who are kind of tired of how many tools there are in the space these days.

One thing I didn’t hear in your previous answer was, any recommendation on any one of those components. Is that true? Do you not make a recommendation about, “hey, everyone should use this particular data warehouse”? It sounds like you're thinking more about the slots you need to fill in to build your stack.

Ethan: Let’s start there, with data warehouses. Among the biggest ones, you’ll have Snowflake, BigQuery Redshift, and then the databases, whether it's Postgres or real time databases.

For 90% of use cases, it literally does not matter. It's a question of cost. The biggest drivers of a data warehouse are going to be, is your company a Google company, or an AWS company, or an Azure company, or do you need multi cloud?

If you're a Google company, use BigQuery. It's great. If you’re an Amazon company - Redshift, right? If you don't want to be tied to either one, and you want multi-cloud support in an ecosystem, you use Snowflake. If you're just trying to query some data, and put it into a visualization tool, it honestly doesn't matter. You could use a Postgres environment on your local machine if it gets the job done. So there's that aspect.

With visualization tools, it’s slightly different. If you're a big enterprise with a ton of different users, really advanced visualizations, like the actual charts and aesthetics, are really difficult for these companies to build. Things like Power BI and Tableau can offer visualization capabilities that a lot of other companies can't.

There's a middle tier around companies that offer a lot of visualization capabilities, but also help with the modeling aspects. Things like Looker.

So I'm transparent about this online too: Retool is awesome. We use Retool, it's very cost- effective. Not only can you use it to write SQL build dashboards, but you can also use it for internal admin dashboards and CRUD application.

The MVP stack, in my opinion, is about getting data in, putting it into a warehouse, and visualizing it. On the ELT front, there are two big buckets. There are your big connectors like Fivetran. If you need big, reliable connectors to Salesforce and SAP and Oracle, Fivetran is a reliable, awesome solution for that problem. It's expensive, but it's a great solution.

If you need things in your own cloud environment, or outside the US, Airbyte, Meltano, Stitch / Singer, where you can deploy the code yourself. Where Portable fits in is the longtail stuff you can't find anywhere else. If you do not want to deal with a connector to your random HR tool, but you want a connector to it, that’s where Portable fits in.

But to me, the tools don't really matter. Like in ELT, if you need a connector from HubSpot to Snowflake or HubSpot to BigQuery, you have 10 options, and they probably all work. So at that point, it's better to spend your time building the dashboards and generating insights to create value then spending three months trying to pick the perfect tool.

Kyle: I think a lot of people would agree with that. It’s interesting, this conversation around keeping it simple and staying focused on the business needs. I feel like that's a theme that I've seen quite a lot in the public conversation.

What are the most interesting topics that are out there in the public space? Modern data stack aside, generally in the data ecosystem? What do you feel like are the most interesting topics that are being discussed these days?

Ethan: To me, there are a couple of things that are fascinating. If you think about the data world, like all the people that show up to Snowflake Summit, or go to Coalesce or talk about data stacks and data teams and analytics and operational analytics, the ecosystem of companies is small. Like it's probably 10,000 teams with a proper data function with the modern data stack.

Relative to the universe of companies, that is a very, very small fraction of companies. So there are two things to think about in the data world today. Both of them are massive opportunities. But they're both very, very different opportunities that need to be thought of separately.

Opportunity number one: those 10,000 companies have data observability problems. They have speed and latency issues. They need to move aspects of their pipeline from batch every 24 hours to real time. That permeates their entire data stack, and their entire culture, everything. So there is a big opportunity for those companies to move into the bleeding edge. How do you productionize things? How do you make them more scalable? How do you do them in real time? And I think a lot of the conversation is taking place there.

At the other end of the spectrum, I don’t see enough conversation taking place but I believe the opportunity is 10x as big. If we have 10,000 data teams out there today, there's probably another 90,000 or 100,000 that could be using a data stack.

So the question is not, how do we sell the next thing to the 10,000 people? It’s how do we get the next 10,000 people onto something that resembles a data stack? Even if it's super basic to start. There are a lot of companies, we work with a number of them. Companies like Mozart Data, Keboola, Y42, that are at the intersection helping data teams simplify things, not have to deal with as many contracts, etc. But they also have the opportunity to help companies that have no data team, no data infrastructure, no analysts, etc.

And as an ecosystem, I think we could go compete on feature-by-feature which ELT tool is best for this connector. Or we could just go get 10 more people to sign up and build a data stack. To me, there's such an unbelievable opportunity to create value from people that don't have an analyst today, or that don't have a data stack today. That's underserved in the conversations, technologies, and opportunities that we're talking about in the market right now.

Kyle: All right, that kind of makes sense, right? Like, if you're thinking about where users of a data tool might live, you're gonna go look at those 10,000 companies where they're already spending money on a data stack. But you're right, there are a ton of teams that are doing things the old fashioned way.

I always joke that your barber down the street is never going to have a data stack, but they might be using Square terminal for payments. And they may be getting analytics from that. They're just getting it verticalized through their service from Square. But they're probably looking at some sort of numbers week to week, to look at how their business is performing. So that’s an example of how they're never gonna have a data stack, but they actually might in some way.

Ethan: So it's a funny example, because we integrate with some systems that a lot of enterprise data teams don't ever think about. So if you think about restaurants or barbershops, these things do come into play, like your point of sale system, whether it's Square, Toast, whatever.

Number two, your employee systems. You actually have employees that have salaries and benefits. For a barber shop, payroll is a really important thing to understand in addition to revenue.

And then the third one for both restaurants, barbershops, and these types of companies, is time-tracking, and checking into and out of work. And in a lot of scenarios, those are different tools. So if you think about someone going into the barber shop, or going home at night after running their small business, it would be great if they could just pull up a single dashboard that shows their bank account, their Square payments, their time-tracking for their employees, their schedule for the next week, the marketing that they're running online, or through advertisements, and get on a dashboard.

Is the barbershop gonna hire a SQL analyst? No, they won't. But should they have cross-platform insights available to them? I believe they should.

I think it's a massive opportunity to help unlock. It's technically very challenging, but it is a massive opportunity for us as an industry to think about things like barbershops and pizza restaurants, because there's a ton of value to be created there.

Kyle: So we’ve talked about the macro, and what's going on in the industry. Earlier, you mentioned data contracts. And this is something that I just interviewed somebody about recently. I think it's a super interesting concept. It's one of these topics that I've been hearing about lately. Anyone that follows you might know that you have an opinion on data contracts. And so I'd love to hear about first, what they are, and second, your take on them.

Ethan: Data contracts are the idea that you have a producer of data within an enterprise. Maybe the engineering team has a database of all the transactions. Then you have consumers downstream that want to do stuff with that. Maybe someone wants to build a product on top of that data.

But from a data perspective, someone else wants to build a dashboard, whether for internal purposes or external purposes. Historically, the analyst or the data team just starts querying the database directly. And then they build this dashboard. And then they use it for financial reporting.

And then one day the metrics are off, and they look over at the engineering team and say, “What did you do? You just broke our system.” And the engineering team looks over and says, “what are you doing with our data? We never knew you were doing this, we're not accountable to you, where did this come from?”

So there's a lack of communication taking place between the two. And the proposed solution is data contracts that are effectively a defined interface between that engineering team and consumers of that data.

The easiest example is, instead of a data analytics team, we're in the database directly, they can do that for ad hoc analysis. But in production scenarios, there is a defined set of data that the engineering team commits to supporting, and the data team needs for consumption and product development. There's an understanding that it will remain consistent.

There are downsides to this approach. There are a lot of benefits, especially in very, very large enterprises where things are decentralized. And that's where most of the conversation is today.

Data contracts are amazing. They unlock speed and flexibility, but also give everyone very clearly-defined interfaces. It's like the trend of service-oriented architecture and APIs for everything inside the enterprise. So they sound amazing. All the conversation today is generally positive about data contracts.

I personally think there's also a downside to them. They’re not universally bad. But before thinking we need data contracts for everything, you have to ask yourself some questions. Because just like APIs, when you build an interface that other people build against, you have to maintain that. And you as an engineering team, or you as a data producer, now have these rigid constraints that you have to build with it. So the key takeaway here is, for large enterprises using things for production use cases, like financial reporting, understanding what that interface looks like, makes a ton of sense.

For most scenarios, to me, even at the large enterprises, I don't think the answer is, here's the interface, are we aligned and that's perfect? I think the answer is joint vision. If you're doing financial reporting on top of data from an engineering team, aligning on the actual fields of data that are coming out of the database, doesn't actually convey to the engineers what's the point. They're not actually bought into what you're doing, they just have signed off that you are doing it. And it's kind of them passively saying “that's fine, you can do that, like we'll maintain it. Sure.”

In my opinion, the better approach is a joint vision for what we're actually trying to do with the data. If the data is being used for financial reporting, and the engineering team is bought in that data will be available to do financial reporting, and the data team is going to use it to do so, the actual interfaces in the fields are secondary. If things change, but people know that the financial reporting needs to say accurate, they will communicate with each other and they will come up with a solution and the interface can change. And I think that's fine.

If everyone's aligned towards the goal, you'll find a way to get there. I think creating arbitrary definitions of interfaces with fields and schemas and types, doesn't align anyone with a goal. It's very possible a data team is saying, “Hey, we're doing this financial reporting.” And the engineering team says, “Have you thought about these other metrics that we also have?” Or “hey, we can go generate this for you.”

If you don't bring them into the conversation around the goal. And all you say is “this is the interface”, you're missing out on collaboration opportunities. That's the biggest downside of data contracts, in my opinion.

Kyle: I see that makes sense. This reminds me of the phrase, “the map is not the territory.” You're saying a map is a useful thing. And there's a reason and a time to have a map. But let's agree on what the territory is in the first place. And then we can bring the map in if necessary.

I definitely would love to hear counterpoints on this. I know this is an interesting topic, if you have an opinion on data contracts, and you want to drop a comment below, I'm sure Ethan would love to chat with you.

So one more big question for you. If you could get data team managers and leaders to listen to one piece of advice that you have, that you think would improve things pretty universally for everybody, what would that thing be? And why do you think they would follow that advice that you have for them?

Ethan: As the leader of a data team, there are really two things you should be thinking about. What is the value you are creating? That's either going to be revenue or cost savings. Those are the only two ways. And what is the cost of doing it?

And the cost is a combination of technology and people. A lot of people ignore people costs when they think about their data team. And then they ended up with 5, 10, 20, 30- person teams that cost millions of dollars. And they're focused on technology. They're not focused on creating driving revenue or saving money.

So to me, data teams aren't there to explore cool tools and technologies and write code. Data teams are supposed to impact decision making to drive revenue, automate workflows to save money, and do so in a cost-effective way, without costly technology and people. That is it. If you don't think about your data team as a profit and loss center, you're just burning money, in my opinion,

Kyle: And tying it back to business value, this is universal. This isn't just the data team, right? This is much anybody who works in any business. But you've always got to bring it back to “What are we doing for the company as a whole?”

Ethan: And the other consideration there is, if you run the data team, and you say, “These are the decisions we impact for the CEO, these are the costs that we save by automating workflows with some ELT into a warehouse with reverse ETL.” And you quantify it and say, “This is the impact, this is the hours we saved,” it's a lot easier to get buy-in for more headcount, or for more technology or to expand the scope.

And if you don't think about that, and all you do is technology, why would your manager or the CEO ever give you more resources? If you can't explain the impact of the business? So to me, if you're the head of data and you want responsibilities, and scope, and all this type of stuff, it's that simple.

Kyle: This is why, here at Bigeye, whenever we're talking to somebody who's looking at data observability, one of the first questions we always talk about is, “Where does data go in the business?” Like, let's not talk about the stack. Let's not talk about what causes a pipeline to break. Like you said, that's all secondary, right?

The question always stems from if a pipeline goes down, what's the actual impact? Who's impacted? Is it your customers? Is it your executives? Is it a partner? How are they impacted? Who finds out about it? So always from the business, working backwards.

Ethan: I don't know, if I'm allowed to ask questions. But which one do you find is more impactful? Quantifying that with numbers, like “This is the number of hours we saved, the dollars on technology that we didn't spend,” or anecdote? Because I believe both are extremely powerful. A lot of numbers can come across as fake. But a lot of testimonials, the CRO will be quoted as saying, “We would have lost $500,000 in deals if it wasn't for our pipelines being robust and observable.” When I think about the value you're creating, I think in terms of both the quantifiable value and the hard anecdotes from leaders. Is that what you're seeing as well?

Kyle: Totally.You were in product management. I was as well prior to this, and in some some sense, I still am. But the way that I always used to run research programs was, first it was qualitative. We’d talk to customers and say, “Okay, here's your quote, here's your anecdote.”

And then after that, you can then go out with a survey, and you can get some actual quantification on it. You can ask, how often is this actually breaking? How many people are actually impacted by this? What was the predicted dollar impact to the company?

But I feel like the place to start is always, if a human being somewhere was not bothered by this, that's kind of the end of the thread, right? If it was a real problem, and there's genuinely no one at the entire company who felt any pain from that problem, that seems like a pretty rare case.

The anecdotes thing is not enough by itself to justify an investment, but it’s super important to know, what human being did this impact and how did it affect them?

Ethan: 100%. Totally.

Kyle: All right, great. You're ready for rapid fire?

Ethan: Yeah, let's do it.

Kyle: Okay, question one. You can either have unlimited battery life on all your devices, or you could have constant free Wi Fi. Everywhere you go. Which do you choose?

Ethan: Wifi. Having access to information will be so simple, and battery technology is actually getting to be pretty good.

Kyle: Fair enough. Okay. Number two. You get to choose a board of directors to help you run your own life. Who's the first person you put on the board?

Ethan: My mom.

Kyle: Great answer. Shout out to mom.

Ethan: She will watch this at some point, don’t worry.

Kyle: Cool, hi mom! All right, number three. Overnight, you can magically gain expertise in any one technical skill or capability of your choice. What is that thing?

Ethan: Learning. I don't know if that's a technical skill. But one the best things to learn in my opinion is how to learn efficiently. Like with reading books, I like speed reading and other stuff like that. Because it compounds. If you can learn how to learn more efficiently. Then the next time, you can do it five times as fast.

Kyle: Compounding advantages. Sounds good. All right. Ethan - great answers. It was a pleasure to talk to you today. If you want to learn more about Portable, you can find a link to it in the description down below. If you have opinions on data contracts, I would love to hear from you and Ethan will be responding to you down in the comments. Ethan - thanks for being on the show today. Totally.

Ethan: Thanks so much for having me.

Kyle: Alright, everyone, thanks for tuning in. We'll see you next time.

Kyle (re-recorded outro): All right. Well, thanks so much for joining us today on this episode of The Observatory. Today, we spoke with Ethan, who's the CEO and founder of Portable. If you want to learn more about Portable, you'll find a link down in the description below. If you have comments about data contracts, if you think they're great, if you're using them, if you're not using them and you think they're terrible - leave a comment down below. Ethan would love to hear what you have to say, and he'll be down there as well. Ethan, thanks for being on the show today.

share this episode