The observatory
-
April 28, 2023

Dawn Woodard, LinkedIn

Dawn Woodard, distinguished data scientist and LinkedIn, and Kyle Kirwan, CEO and co-founder of Bigeye, discuss data science careers, managing teams, and more.

Read on for a lightly edited version of the transcript.

Kyle: Hello everyone. Welcome back to the Observatory. I'm your host Kyle, and today I'm chatting with Dawn Woodard, who's a distinguished scientist at LinkedIn. Previously, she was a senior director of data science at Uber, where among other things, she worked on pricing on maps, including routing and ETA prediction.

She's worked on experimentation platforms for both AB testing and also marketplace level experiments, and other technologies that are really foundational to Ubers Rides, UberEats, and freight businesses. She’s been in ML and AI for over 15 years. She began her career doing research as a professor at Cornell.

So Dawn, welcome to the show.

Dawn: It is fantastic to be here, Kyle, It's great to see you again after many years. So we work together at Uber as this is a wonderful reunion!

Kyle: Yes, great to see you again, and I'm excited to catch up on what you've been doing since then. So you're a distinguished scientist at LinkedIn.

I think that is a pretty rare role that many data scientists would probably love to get to at some point in their career. But maybe a lot of folks are not familiar, myself included. What does your role entail, and how would you say it's different from other roles that you've had in data science in the past?

Dawn: This is a dream job for me. I'm loving it at LinkedIn. It's a shift from my role at Uber where I was leading data science and machine learning teams from the management side.

The key role of a distinguished engineer or other kinds of senior individual contributors is that we do technical leadership for particularly challenging or cross-cutting initiatives.

So, for example, right now at Linkedin, there’s a major investment in embedding based retrieval for use in search and recommendation, representation learning, really across the spectrum to create these personalized experiences. And this is the initiative that I'm driving at at LinkedIn. So it involves eight different teams across data and AI and infrastructure.

And so projects that have that kinda either technical or organizational complexity would typically be the kind of thing that we would engage with.

Kyle: Is that different in the ways that you have to actually get the work done? Or do you find that your day-to-day in this role is fundamentally different from roles you've had previously?

Dawn: Definitely. It's considerably more technical than the later stages of my roles at Uber, where my teams had gotten pretty broad across a range of spaces. At some point, as my team got larger and the scope got broader at Uber, I was less able to engage with some of the technical decision-making because my role was to enable other technical contributors, as opposed to doing the work myself or ideating myself.

And it's really wonderful to get back into the technical weeds. This feels a bit more like my role at Cornell, in some ways, although obviously less academic, and much more immediate in terms of the impact. But it’s really fun and really technical.

Kyle I’ll ask you about this later in our chat, about what the differences are in the data science org and culture at LinkedIn.

Obviously every company's got its own culture within data science, so I'm curious to hear about what that experience has been like at LinkedIn. You've been there for almost a year. You mentioned that in the management role you had at Uber that you weren't as technically in the weeds, it was much more about operating the teams.

Whereas in this role, you've gotten to go back to that and get into the weeds. Is that a function of the culture or the workplace or is that a conscious decision that you made in your career path? Did you decide to switch from being more in the management track to more in this technically-focused role?

Dawn: Right. It was a career choice that I made. Some companies have these distinguished engineer and other very senior individual contributor roles. LinkedIn has them.  I think it's a great role for a company to have. Not every company invests in this type of leadership.

Often the investment is more on leaders who are managing large teams, for example. But I think this role is really special and it really is able to bring that deep thought about what we're doing across our different teams, how those things fit together, and whether we're using the right technique in the right place.

Happy to talk about the data org at LinkedIn. That's also one thing that really drew me to LinkedIn.

Kyle: Yeah, let's hear more about that. How  would you describe it? What’s unique about it? I think you've been there for close to a year now. What's your impression?

Dawn: So one thing that's great about LinkedIn: data science and machine learning are all within the same organization. That’s really valuable because it's incredibly clear where the machine learning modeling is done and the implementation of the models, for example, production implementation.

At some companies data science’s role is ambiguous. Sometimes data science and engineering are in different organizations. Is machine learning done in engineering? Is it done in data science? Sometimes it's a mix of the two, and so putting it all together under one roof, ensures we’re clear about who does what.

Kyle: It's more verticalized. And when you say data science, we're really specifically talking largely about modeling. We're talking about machine learning. I know that the term has broadened a lot, especially over the last 10 years or so. So to be clear, we're talking about the teams that are gonna create models, get them into production, and that's all one holistic group at LinkedIn.

Dawn:  Actually data science is broader than that at LinkedIn. Like many companies, data science has three different specialties. At, at LinkedIn, one is more focused on data analysis interpretation. The second is around data engineering, and the third is around modeling. So yes, modeling is an important part, but it's not the only part.

And then we also have machine learning engineering within the data org. So a mix across these different specialties actually. For example, we have economists as well, and those would fall into the data science modeling subcategory.

Kyle: Very cool. And actually that segues into another question that I had, which is, we talked about this in the introduction, but you've worked on a bunch of different problem spaces.

Pricing obviously was super critical to Uber's entire business. Anybody that's operating a priced marketplace, obviously that's core. You also worked on mapping, which is pretty specific to their business: mapping and routing. How do you design the route of a trip? But you've also worked on A/B testing, you've worked on forecasting, so essentially a ton of different areas in data science. So I wanted to ask about the experience of playing within all of these different problem spaces.

Dawn: I love the opportunity to play in these different sandboxes. Some of that is different application areas, like Machine learning applied to ETA prediction and optimization applied to routing, for example. Some of that is creating more platform technologies like AB testing and other types of experimental designs for evaluating the impact of changes.

And it's just so fun to be able to learn from all of these different areas. We have these subject matter experts who know so much about maps and map data, and they are just constantly willing to teach and walk me through things several times, until I really understand. And so it's really incredible to be able to work with people who are so knowledgeable about a particular application and they're excited to invest because we're building out a machine learning technology for them.

Kyle: So you are sort of bringing the technical aspect: these are tools that we have, these are things that we can do. And then they're bringing that subject matter or like definition of the problem: what are the challenges involved in building an optimal route?

And then you're meeting in the middle to solve a practical challenge for the business. Is that a good way to summarize?

Dawn: Absolutely. And over time, I learned the subject matter and the application area. But there's always more to learn, and that's basically how we break out responsibilities.

Kyle: Any of those that you can claim as a favorite that you've worked on so far?

Dawn: Oh, I love maps. It's really a lot of fun. There's something very visually engaging about maps. It's an incredibly challenging machine learning problem. They say  truth is stranger than fiction, and I would say that this is absolutely true for maps.

The reality of the physical world and how people interact in it, is more complicated than a purely virtual world in many ways. It's so hard to build a mapping system where you. Get every point of interest correct: The name of the restaurant, the exact location, where is the parking, which streets are one way versus two way.

And you have to get every detail right, because these are safety considerations. You have to create efficient routes based off of traffic that's happening right now. So how do you do that in a setting in which there's sparse data on many roads, for example? Maybe you don't have recent information, but in places where you do have data, how do you take advantage of that?

So just a really fun machine learning problem. It’s really not standard, because it's a prediction on this manifold, which is the road network. It's not a classic prediction problem where you have a set of inputs and it goes to an output, which is the ETA prediction. You can do ETA prediction in that way, but you won't do as well as the best mapping platforms.

You really have to get in the weeds on what's happening on individual roadways, and how is that different from what's normally happening at this time of week? What are the really granular patterns?

Kyle: Sounds like a messier than usual problem. Maybe that's where part of the interest comes from.

Dawn: Absolutely. So what is the right architecture in such a context? Right? It’s not obvious. How do you model at the level of individual road segments, but also think big-picture? You're trying to predict ETA from one side of the city to the other side of the city. So you shouldn't lose track of these macro effects like, is there an overall congestion effect happening right now? But at the same time, you also need to understand the dynamics in a very detailed way.

Kyle:  Very cool. And as you've gone through these different problems throughout your career that you've gotten to tackle, I'm sure that just the general AI and ML space, the techniques and  the tools available have changed throughout that journey for you. To reiterate: you started your career doing research in machine learning. So we're talking about fundamentals all the way through to the most global scale form of putting it into practice.

So there’s a big breadth of experience there. What has shifted or what's changed in the landscape of machine learning and AI the most over the last decade, and where do you see it continuing to evolve?

Dawn: So there's been an enormous expansion in the use of machine learning methods across areas of our life where we didn't really expect it.

That to me is the biggest change. I expect to see much more of this in some traditional industries moving forward. So looking back, something like taking a ride from point A to point B was based on some very basic technology. Then we started bringing dynamic pricing and dynamic dispatch technologies that are optimizing over the whole network and car dynamic carpools to create efficiency.

Bringing that kind of technology into a space that was previously dominated by taxi companies, for example, was such a radical change. And then similarly, if you look at search technologies, there were plenty of search technologies a decade or two ago, but those were based more on keyword matching or large repositories of information and pulling from those repositories in clever ways.

There were very specialized methods to search and web search and what you've seen is a real embrace of representation learning, and embedding-based retrieval technologies in order to power these. How do we implicitly understand the meaning of a search query, rather than trying to just pull out a keyword and explicitly understand what the user is asking for?

And then how do we understand the content of potential results in a really rich way and provide the right set off of that? So you see this evolution from a very specialized, keyword-based approach to search to a deeply machine learning-based approach to search and search systems.

So that and the broadening of machine learning has really been dramatic. And looking forward, I see that really expanding into real estate, into government, and public transportation, and into the automotive industry, and healthcare. These sectors where, yes, there have been initiatives around bringing machine learning into these spaces, but it's really slow. And these projects take decades, right?

Really heavily-regulated sectors or sectors that are driven in the public domain often need public-private partnerships in order to really bring in the right set of dispatch technology. For example, in public transportation, we still have fixed bus routes and fixed schedules in most places, right?

And we'll continue to need those, but we really need to supplement with dynamic dispatch of vehicles based on demand. In order to do that, you're starting to see the emergence of public-private partnerships like Uber with public transportation companies, or the embrace of some of these software packages developed in the private sector by governments.So that's one example.

In the real estate area, you really see this expansion of real estate as this commodity that's liquid, and where there's a true marketplace. Opendoor’s on the cutting edge of this, for example, and there are a couple other companies. But instead of having this heavy overhead associated with each transaction, this is seen as a liquid commodity for which there’s a marketplace.

Kyle: So, if I can touch on a couple key things that I heard there. One is obviously just the gradual march of machine learning into more and more aspects of society frankly, and everyday life. More than what we thought would be possible a decade ago.

But then you also mentioned being able to get deeper at what a human being is actually looking for when they're executing a search query.

Are we going beyond surface level applications of machine learning, in order to get deeper and closer to the model understanding what humans are actually trying to do and what they're actually trying to look for, rather than an approximation?

Dawn: Yeah, I would say the movement from explicit representations to implicit machine learned representations. So examples of explicit representations would be a database of movies and the actors in those movies. And that can be used to power search via a keyword-based approach, for example.

Contrast that with understanding the intent of a user's search in an implicit way through representation learning, and then trying to match appropriate content. So that really allows for a much larger space of potential queries, and a better match in the sense of being able to address intent as opposed to words that can be misleading in some cases.

Also it's much easier to keep that up to date than if you have this explicit database of movies and you constantly have to be adding to it. The best solutions involve a mix of both. But we’ve moved away from the world in which you have to have large taxonomies with explicit databases and human annotation, and into the world in which a lot of this is more automated and more implicit, through embedded representations.

Kyle: It sounds less brittle and less finicky.

Dawn: Yes. More scalable, much easier to keep up to date. More accurate. Much more accurate.

Kyle: You talked a little bit about where you see the field going in the future.  For folks who are at an earlier stage in their career, who will help create that future, what advice do you think you'd give to someone who is very early in their career in data science?

Dawn: One piece of advice I always give is that it's easy to focus a hundred percent on the models. What's the most sophisticated model? What's the latest model? Coming at a problem from a purely conceptual perspective where you say, “This is the model we're using, how do we loosen those simplifying assumptions?” This is a cognitive pattern that I often see with people who are just coming out of school. They want the more sophisticated model. They wanna relax the assumptions.

But what I would say is, understand your data, understand whether the input data is really representing what you think it is. Maybe you're missing some important signals, so pay attention to the data and not just the model. That's super, super important.

And anybody who's been in the field for five plus years has come to that understanding, but I just suggest getting there faster. Understand the data, understand your domain, what problem you're trying to solve, and then pick the right solution for the problem.

So the way I like to think about it is, I write down my ideal solution to a problem that's probably overcomplicated. It's probably something that's too rich of a model, right? And then step back and say, Why did I pick this model? What are the two to three most important aspects of the problem that I need to capture in my model?

And then simplify way, way down to say, I just need a model that captures these few things. It needs to be super robust in production. It needs to work in many, many different corner cases. Simpler is better from that perspective. It's also more maintainable.

That's one piece of advice on the technical side, understand your data. Start with the simplest solution. Make sure you're always comparing to a very basic baseline, because it's easy to just throw the kitchen sink at a problem.

Kyle: We talked earlier  in the chat. You made this conscious move. So you, you were managing a number of teams at Uber. You had exposure to pretty fundamental pieces of their business. You then decided that you wanted to take this highly technical, very senior role at LinkedIn.

Maybe you can speak a little bit also to folks who are earlier in their career. How would they think about their career track or their trajectory and the different directions that they could go? Even within data science or machine learning, there's many different places to go in one's career. What advice would you have for them on that?

Dawn: Great question. It's really easy to think that going into management is the only path to continue to grow in your career. It’s easy to think that the best people all go into management. It's absolutely not the case. It's just a misconception, and really talented individual contributors are gold in the industry. Everybody wants them.

It’s also an incredibly fun career trajectory to have. So I just really encourage people to consider a broad spectrum of potential career paths from management on one side through to very senior individual contributor on the other side.

And there are some things in between. We had this tech lead manager role at Uber, which was a role where you could continue to grow in your career, have a couple of direct reports, and never really grow your team. But really get deeper in terms of the quality of the technical solutions that you're bringing to the table. So more like an individual contributor, with a small number of direct reports.

So there's a spectrum there. And don't forget about the other side of that spectrum. And in particular, even if you want to go into management, take your time in an individual contributor role first. It's very easy for companies to have pressure to move people into management roles, and so you may have opportunities to do management very early in your career.

It's not always the best move for you, professionally. It's really important that you understand how to do before you go teach. Having those hands-on skills is something that lays a foundation for the rest of your career. So don't rush into management.

Kyle: Sounds like sage advice. You mentioned like there's straight up IC, at some companies there’s a hybrid technical lead, and then there's pure management. Taking your time sounds like that makes a lot of sense. And being aware of the spectrum that’s available is important as well.

Well Dawn. Are you ready for some rapid fire questions before we wrap up for today?

Dawn: Let's do it. Sounds like fun.

Kyle: All right. Number one. So you’re obviously at LinkedIn. I love reacting to things on LinkedIn in the feed. Whenever I have an opinion, there are seven LinkedIn emojis. Fun fact, for anybody who didn't know. Of those seven, which one is your favorite?

Dawn: Well, of course I love the heart emoji. I come across at work as somebody who's very direct and very focused on the technical work. But I just love seeing people's career growth as it surfaces in my feed, and seeing some of the incredible accomplishments that people bring to the table, and really celebrating those.

Kyle: Question number two. Pricing algorithms. You've worked on them. What's one thing you wish that everybody knew about pricing algorithms?

Dawn: It’s easy to not understand what dynamic pricing is trying to do If you're a consumer. It’s supply and demand. If it's done well, if it's in a context of a marketplace that has multiple competitors, it really is not the company trying to extract more money or trying to price gouge. Because honestly, companies that do that don't survive. They can't compete, and they can't give a price that's competitive. And so dynamic pricing really is about how do you balance supply and demand in such a way that really balances the marketplace? And so, I think that’s a really important conceptual distinction for people to understand.

Kyle: Last question. You have a six month sabbatical, but you have to stay in one place the entire time. What would you do with it? Where would you be?

Dawn: Well, I would definitely go to the Swiss Alps. I'm a big lover of mountains and landscapes like that. I think that would be really fun to escape, and bring my family and friends with me.

I would spend the time digging into LinkedIn data. I often don't have a chance to do that on a day-to-day basis. I’d just play around and see what I find, and also just do a lot of reading because day-to-day work gets so busy that it’s easy to not have the time to learn the latest methods or what other companies are doing in your space. And so I would love to just expand my understanding in this way.

Kyle: And the Swiss Alps sounds like a great place to do that!

Well that's it for the rapid fire questions. Dawn, it's been awesome. We haven't chatted in a while. It was great to catch up with you again and hear about your work at LinkedIn. So thanks for joining us today on The Observatory.

Also, we’re going to include a link down in the description to your website, which is woodard science.com. For folks who wanna learn more about Dawn.

And I believe you are hiring at LinkedIn for senior individual contributors and senior level managers. And this is in that machine learning group. Is that correct?

Dawn: Absolutely. Thank you so much!

share this episode