The observatory
-
December 19, 2022

Colin Zima, Omni

Colin Zima, founder of Omni, talks about workbook models, lessons learned from Looker, and nudging toward scalability.

Read on for a lightly edited version of the transcript.

Kyle: Hey, everybody. Welcome back to The Observatory. I'm your host Kyle from Bigeye. Today we're talking with Colin Zima. Colin, you've worked on Google search. I saw a dynamic restaurant pricing startup, which you sold to Hotels Tonight; where you then led data. You were at Looker for almost a decade, and you recently left Looker. You just started Omni I think less than a year ago. Is that right?

Colin: Yes, in February.

Kyle: Cool. All right. Well, Colin, welcome to the show. I'm sure everybody's pretty excited to hear a little bit about what Omni is. It's a super cool concept. So I'm excited to talk to you about it today.

Colin: Thanks for having me. Excited to tell you about it.

Kyle: Business intelligence, data engineering, both are obviously flooded with a ton of new tools right now. Conceptually, with Omni, why don't we start with what problem you feel like you're addressing that’s not already being met in the space?

Colin: I think the operative word is “balance.” With products like Looker, on one side of the world, you've got all these tools that are built around centralized data models. And that was sort of the way that BI was done historically, where there was a single layer that defined all of your metrics. And that's actually how most BI tools started back in even late 90s, early 2000s, with things like business objects, things like Cognos and MicroStrategy.

On the other side of the world, there's this concept of what we call “workbook analytics.” Tableau is a canonical example. Excel from way back in the day is another great example of workbook analytics, and even SQL runners now, so things like Mode and Periscope, before it was part of Sisense.

The idea in the workbook model is that you get isolation. An individual user can move very, very quickly. And you get this diaspora of different objects. But people can do analysis very, very quickly. And so you've got these two separate worlds. At Looker, we saw so many companies that were experiencing tension between these two platforms. Do I put things in a centralized layer so that everyone can agree where any given individual then is slowed down in their day-to-day? Or do I lean into the workbook style of the world, where everyone can do anything that they want to do, but there's no reconciliation between everyone?

And so the core of what Omni is, is a balance between the workbook side of the world and the centralized analytic side of the world. What that means is that at any given moment, a user can operate in a workbook, they can work with complete isolation, they can move on their own, and the analysis they're doing is not impacting other users in the organization.

The key is, rather than thinking about workbooks as fundamentally disconnected from a centralized model, we maintain a workflow so that all of the logic in the workbook can be picked up and promoted into a centralized model. We can layer isolation and speed and power and all the things that a given individual user wants to do on top of a centralized data model. An organization can make more nuanced decisions about what happens in isolation, and let a user do an analysis that is not published to everyone, and then make decisions on the backside about how they actually want to govern metrics.

So what should get pushed into a shared model? A la Looker? What can even get materialized into the database in something like dbt? Instead of thinking of a black and white “yes, no” scenario, it's more nuanced. How much do I want to push down? And how far do I want to push down? It lets users move more quickly to start, and also lets people govern a little bit more thoughtfully over time.

Kyle: We need to double click here a little bit. I'm the user. My default mode is going to be opening up one of these workbooks. I'm going to start doing my work. Let's say some of my work is going to be leveraged directly off of the existing materialized data model. And then some of what I'm doing is modifying that model for that specific work. You mentioned that there's a continuum. It sounds like there are a couple of different options. What are those?

Colin: I think it almost helps to think of an example and how it works in a bunch of different scenarios. Let's say we're a company and we have a CSM that needs to do a net new analysis of customer health. And this is actually how we built customer health at Looker. In the workbook side of the world, they will launch a workbook, they'll do some analysis, and they'll publish it. And they'll say, “This is customer health.” The challenge that you ended up there was just complete isolation. It exists in this workbook. And if you actually want to promote it out to the organization, you need to effectively do surgery in the workbook.

Kyle: Like a CSAT score, customer health score. And you define that inside the workbook.

Colin: Exactly. And so that's the workbook side of the world. In the Looker side of the world, or in the centralized model side of the world, the way that gets done is before you can even start building that health score, you're dipping into the data model.  You're essentially creating concepts in that data model, you're publishing it, and then you start to build your analysis.

So you're almost going through this process of modeling data before you can even start your analysis. In the Omni side of the world, we want to think about data modeling as doing the analysis itself. So a user builds that analysis in their workbook. And rather than that workbook being completely disconnected from that core data model, at any time, a metric can be picked up out of the workbook and essentially pushed down a layer. If we make a measure around whether there are more than 10 active users, for example, and it's some sort of trivial measure, we can actually lift that metric out of the workbook and into the centralized model. You can almost think about it as pulling it out of Tableau and sitting it in Looker.

Rather than having to either materialize or model that metric, before you started, you're building health score. And then you're thinking, “are these metrics universal, which should get pushed down into the virtualized model?” And then you’re even potentially deciding whether those need to get lifted out of a virtualized model and put into an ETL cycle.

It’s kind of funny, because even with Looker, I built health score in an afternoon. And we started publishing it out, we liked it. We then realized the Looker model was too brittle a place for that. So we actually picked it up and put it in Airflow.

And the idea is, rather than thinking about that as ETL, or sort of heavyweight extraction, or pulling in a new user, we can really guide workflow around just picking up that metric and sliding it down one level in your ETL cycle. The idea is that now more people can actually model, but they're not going directly to a data model and modeling data. They're doing analysis, and then a data team can come back on the back end, and make decisions about what analyses are canonical, and should actually be shared, versus what should sit in isolation and stay in isolation.

Kyle:You kind of alluded to the idea of a model or another layer that sits within Omni, but between the materialized data model and the user's workbook, is that actually what's going on? Do we have a semantic layer inside Omni?

Colin: Yes, there is a very lightweight semantic layer. Another one of these small reversals that we're trying to make with Omni is thinking about that more as just a SQL layer that is compiled, versus a language or anything sort of special. It encodes things like joins and dates and relationships between metrics.

So a revenue minus cost equals profit. But it's written more in the front end in SQL as you actually build analyses. And then we're simply encoding it in this layer so that you have relationships when you need them. And the idea is that by applying less proprietary pieces in our modeling layer, it's blocks of SQL. It also makes it easier for us to lift it out of our model, and potentially even push it down below the Omni layer.

So there is encoding of joins, so that when I have two tables in the data warehouse, they don't need to be materially joined together. But if you want to do that, we want to provide workflow, so that you can make that decision progressively. And again, not have to completely rework every single system above to make a decision about how you optimize an analytics set.

Kyle: So from workbook to the lightweight semantic layer, I can just go set up Omni and this stuff works out of the box. Getting it down into the materialized layers, it sounds like things get interesting. Are we, or am I, giving you credentials to modify my repo with my dbt models? Are you getting a write account? How does that work?

Colin: Transparently, we haven't figured out all of these answers yet. With early moving things like fields, we can do relatively trivially. It's TBD whether we're going to host a dbt project on behalf of the customer.

The idea is that we could do it out of the box, but again, using things off the shelf, rather than proprietary transformation pieces. And then if you would like to self-host it, the idea is that you can lift it up and just as easily move it into your project or system. But yes, if you want us to write things, we're going to have to have right access to your database. So we're definitely working through a database, superuser, and all that kind of stuff. But I think the idea is that, while we have this semantic layer, the idea is not to trap the user in the semantic layer. The idea is to use that to actually improve the workflow from taking a piece of business logic.

So I always give the example of writing a case statement in a workbook. I can then make a decision about whether that case statement should be pulled out as a virtualized field. And then yes, we can make another column in our data warehouse if we want to. But the idea of starting in the data warehouse to read a case statement, and actually predicting the world of things that your users need to do, has been a trend with the growth of dbt. Let's materialize everything. And there's only reporting tables. And I think we're trying to soften that balance so that users can move quickly.

Kyle: The workflow of designing the model perfectly, and then people doing analysis, in my experience is always a bit of a pipe dream. A lot of folks probably build some sort of a model, people start using it, they run into the edges of the model, and they build those into their queries. This is not a thing you're actively thinking about, right?

You're just like, “Okay, here's what I have to work with, here's what I'm going to do in my query.” And then at some point that's been repeated enough times that someone's like, “Okay, time to fold it back into the model now.” So all you're you're just shortening that round trip cycle and taking steps out of the process.

Colin: Yeah , that's exactly it. We're literally just trying to build the product that we wish we had for doing exactly that. So I need to put out Reporting Tables, but I know that they're going to be wrong sometimes. And then I need to refine them over time.

And I think there's another version of this, which is just that the data team can't get to every data set that you have. And the idea of only putting 10 Reporting Tables into your users’ hands, means that you're taking self-serve analytics down massively for an organization.

But the flip side with a product like Looker is that the contingency is still on the data team to publish things out. And that's why these workbook products pop up. With SQL, you have access to the full data warehouse. In Tableau, I can worst-case scenario find a CSV and dump it into my workbook, and completely route around the data team.

Our idea is instead of creating this friction, can we actually make these on and off ramps for actually encoding SQL? So users can still go as fast as they possibly can. But we can actually govern it over time.

Kyle: Okay, so we've talked about the ability to do push down, from the workbook to semantic layer into the material layer. There's also the question: when do we do such pushdowns?  did see some language alluding to the idea that you're going to suggest or recommend or otherwise do this somewhat automatically or intelligently? Is that real? How granular do you think these recommendations can get?

Colin: It's definitely not real yet. I think this is the great fallacy of data people. That there is a correct answer for how to manage your metrics. My point of view is that this will be evolutionary over the life of a company.

When  I was at Hotels Tonight, for example, and this was probably not a great idea, but I just gave every single person admin access to Looker. Anyone could do what they want to make any metric that they want, because the trade-off of moving really quickly was worth more than the trade-off of control and correctness.

The version of everything in a workbook would immediately be promoted down into a core model. And I'd put materialization on the side right now. When I see mature companies, I see products like Spectacles that are built on top of Looker or Bigeye, looking into how data is tested at scale. And I think that mature organizations fundamentally need to do completely different things. Putting a metric in your data model is incredibly important because it has organizational impact over thousands of people that communicate less well.

And so in that scenario, the idea of leading metrics in workbooks can actually be very powerful, because we can let people do analysis and things can exist everywhere. But maybe the data model actually is intentionally smaller than it would be in a mature organization. A lot of the challenge that we saw with Looker is that the model monotonically increases over time, literally every new analysis produces a field in the model.

And 10 years later, you've got 6,000 fields in your data model. And it's not really a true data model anymore. It's just sort of a dumping ground. And then you go delete everything, and you start over again. And so my version of this is almost thinking about it like a crank, which is how aggressive do we want the system to be about telling you to pick up deals out of workbooks and push them down? And sort of what tools can we build to say, “these 10 metrics look similar, but maybe don't push all these down, try to align these workbooks together.” We have not done any of that work yet, other than the ideation. So I view that as almost the whole product over time. But it needs to be a good BI tool first.

And then we think about how these integrations were solving those with humans for now. And then the idea is, understand what humans are doing and slowly put algorithms around them. And again, we're not talking machine learning, but more  “If, then” statements, and matured over the times that you've got tools sitting there to help.

Kyle: I assume you could get to some reasonable MVP here where it's like, “do we see verbatim this case statement in more than five queries from five different users”  If so, that seems like a great candidate, at least to promote to the user and say, “Would you like to push this down?”

Colin: Exactly right, or just a single workbook that everyone is using. If you have a workbook called Corporate Dashboard, and it's accessed by 1,000 users, it's probably a pretty good example. You want to look at the business logic that's in that and actually understand that there are canonical metrics in it.

So there's no rocket science behind any of this stuff. It really is just how do we build that, before you need to encode all of those things? And then really just nudge you towards scalability over time?

Materialization is just another version of this, which is, what metrics do you want to pull out of this virtualized modeling layer, and actually put in the data warehouse? I see examples where people want to encode every single dimension in their data model in the data warehouse. And that's not difficult to do.

But you're going to want to make decisions about how frequently you want to be building those tables. And whether that actually creates more confusion than less, you've got 100 case statement columns for every single type of filtered measure or something like that. And again, I don't think there's going to be a right or wrong answer. That's going to be sort of stylistic for different people. But we can help enable it.

Kyle: When you were describing the pushdown layers, I was thinking an L to cache on a processor, and then RAM. And then writing to disk is the last resort and metadata being exposed through your APIs to other tools.

I'm obviously thinking about my context of Bigeye, being able to consume and understand when metric or any other macro was pushed all the way down to the materialization layer. That's a really strong signal in my world that we would want to deploy anomaly detection models over that metric and measure it for the user. So that information that you are on the frontline of gathering could produce an enormous amount of value for a bunch of other products in this sort of data stack.

Colin: I think that's exactly right. There are going to be different freshness constraints and quality constraints on all of these things. Again,I think this is one of the hard problems that we had with Looker. It was very difficult to make a single model where some of the things were  mission critical, and other things were more loosely defined. And so we're intentionally leaning into this world, where sloppy things can exist. And other things can be very firm, and they can coexist with each other.

Because when you do try to bang everything into the same system, it creates this tension where either you're sacrificing agility, or you're creating this really heavyweight process on top of a person that wants to create their own little region bucket because we're doing a field sales motion and we want to see where our customers are across eight states. Playing around with that is what we're really trying to nail.

The other really, really important concept here is so much of even Looker was this mix between the software and how it was implemented. And I think one of the hardest things about data analysis in general is that there's so much style that needs to get applied to every different product. For that, we're really just trying to learn how people use us. And thus, what needs to layer on top. So, how do we hook into observability? So that it just makes sense? And what things do people need to make decisions? Because we don't need to control the decisions. I just want to really make the tooling so that it can be fluid. And then let smart people use it.

Kyle: Developers are going to come up with their own paradigms and best practices. They'll surprise you, right?

Colin:Yeah, Hack-ability was a big part of our success. We didn't always envision what people were going to do. And then smart people figured stuff out.

Kyle:  Well Colin, Omni's super interesting. I definitely want to get my hands on it at some point. But it is time for rapid fire questions. Number one: Batman versus Iron Man. Who would win?

Colin: I mean, I think Batman’s probably winning a fist fight. I think if we're fully geared up, I've got a favor Iron Man. He just seems to have better tech.

Kyle: All right. Number two, what is your perfect computer monitor configuration?

Colin: I literally don't know what that means. So it also means that I'm probably not configuring my computer monitor. Whatever 12-inch MacBook Air gives me by default, I'm a religious non-configurer. So I do whatever the default is in the system. And I shrink every web browser page to like, 60%.

Kyle: You're doing all your work on a laptop?

Colin: 100%

Kyle: Wow, okay. All right. Number three, the last one, and possibly the most controversial. Are phones too big these days?

Colin:Yes. And this actually ties into my last answer. My favorite device of all time, the iPhone SE One, is I think the smallest iPhone that ever got made. I went through three of them before they dropped operating system support for them. So I've actually written a letter to Apple asking them to make smaller phones. We've lost that battle.

Kyle: I love my iPhone Mini. I'm very sad to see it go. It does seem like the world, at least on the Apple side, is moving on. I don't know what's up in the Android ecosystem. Maybe they still make some small phones. They look pretty big.

Colin:No, they don't. No one is making small phones. It's kind of unbelievable. But I don't know, I'm a one-handed phone user, which I guess it's not common. Or people have gigantic hands.

Kyle: We have a link to Omni down in the description below if you want to check it out. Colin, are you in invitation mode? Or if someone wants to try out Omni, they can go to your site. And what happens after that?

Colin: It's not open on the web yet. But we're walking people through onboarding right now. Just as the platform gets built out. We should be open on the internet, probably Q1. But for now, you can fill out a form and you'll probably talk to me about whether it's a good fit for now.

Kyle: Sounds super cool. I can't wait to try it out. Like I said, there are a lot of tools in the stack right now. But I really haven't seen one like this, and the push-down model. It’s super interesting. I'm excited to see you guys take it further. So Colin, thanks for being on the show.

Colin: Thanks for having me.

Kyle: All right, everyone. We'll catch you next time on the next episode of The Observatory. Thanks for watching.

share this episode