Thought leadership
-
October 31, 2023

Data Leadership 101: Tips and Tricks for New Leaders

For new leaders who were recently promoted or offered a new role, suddenly being responsible for the success of an entire team can be daunting.

Egor Gryaznov

With the increasing adoption of data and analytics across companies, data engineering has become a sought-after skill. As companies scale their data engineering teams, strong leadership is required to help set the vision, build the team, and deliver impactful data products.

For new leaders who were recently promoted or offered a new role, suddenly being responsible for an entire team can be daunting. In this article, we'll share easy to implement tips and strategies to help you succeed as a new data leader.

The Role of Data Engineering Leadership

Data engineering leaders have a multifaceted role - they need to understand both the technical and managerial aspects of building high-performing data teams.

Their core responsibilities include establishing the organization's data strategy and vision, recruiting and selecting data engineering talent, and providing ongoing guidance and mentorship to data engineers. They are also responsible for advocating for best practices in data engineering, overseeing the development of data infrastructure and platforms, and aligning data engineering endeavors with key business objectives.

Hiring & Managing Your Data Team

Assembling High-Performing Data Engineering Teams

Data teams tend to evolve from small, centralized groups into specialized competencies distributed across an organization. As the value of data becomes apparent, bottlenecks emerge in having a single analytics/data team. This requires thoughtful splits into verticals like analytics, data engineering, and infrastructure. Eventually, data skills need to propagate so that each business domain has its data experts focused on their specific needs. 

This is generally the point at which you start thinking about how to best break up responsibilities. One way to think about this is vertical vs. horizontal teams. Vertical teams have a narrow focus, usually on one specific group of users or a single domain area. For example, a vertical team could be dedicated to historical metrics or revenue.

By contrast, horizontal teams have a wider focus on providing data and support to numerous downstream teams across the organization. For example, an analytics horizontal team may provide reporting and dashboards to various product groups, marketing, finance, and others. Their goal is to enable broad access and usability of data.

Infrastructure teams are also horizontal by nature since they provide foundational data systems for all teams rather than optimize for one particular domain.

Fostering a Culture of Collaboration

Collaboration is vital for data engineering team success. Regular gatherings are essential, as at Datadog, where bi-weekly meetings drive tool consistency and best practices. A "Data Engineering Guild," like at Spotify, encourages knowledge sharing and standards alignment, even among non-collaborating teams. 

Shadow data teams emerge when data scientists, under pressure to deliver data features or products quickly, bypass established data engineering processes to access and manipulate data directly. This has been enabled by the rise of tools like DBT that empower data scientists and analysts to create data pipelines themselves

Shadow data teams tend to result in data debt, since the data scientists/analysts inevitably create artifacts (temp tables, duplicated metrics), in the process that end up increasing the data stack’s maintenance cost.

While at a fast-growing startup, some shadow data teaming is inevitable, there are some ways that you can mitigate the effects:

  • Hire data engineers and data quality engineers before hiring data scientists.
  • Invest in self-service dashboards that will help business stakeholders answer common questions themselves.
  • Invest in data quality initiatives and monitoring that mean data scientists don’t have to spend as much time debugging data issues.
  • Establish data catalogs and other metadata stores that can help data analysts/data scientists find the data they need.
  • Establish a set of curated analytics datasets that data science teams can work off, with strict SLAs, lineage tracking, and quality guarantees.

Managing and Motivating Your Team

Ultimately, autonomy, mastery, and purpose drive high-performing data engineering teams.

Empowering data engineers to make independent decisions, encouraging skill mastery, and emphasizing their meaningful contributions while aligning with the team's mission is essential. Managers facilitate this by not micro-managing, providing challenging projects, and creating an open culture where feedback is welcome.

Hiring for Success

The two key elements to look for in the hiring process for a data team are technical competency and cultural fit.

For assessing technical skills, there should be specialized interview processes for different data roles like data engineer, data reliability engineer, and data analyst. The interview questions should be tailored to probe the candidate's capabilities in areas like large-scale data systems, writing ELT jobs, Spark, etc based on the role.

In addition to technical skills, cultural fit should be evaluated during hiring. Enthusiasm and passion for the company's mission and products help align new hires with the existing team's energy and values.

To evaluate technical and cultural strengths, we suggest involving team members from diverse disciplines in the interview process. This allows you to assess technical skills, communication ability, and culture holistically.

Once new hires have joined the company, they should be integrated through training overlays on real work, networking events, and mentorship programs. For example, Spotify's “Data University”, a data engineering curriculum led by senior data engineers, enhances skills. Documentation on existing architecture, pipelines, and roadmaps can also help situate new team members quickly.

Creating Career Paths for Data Engineers

With demand for data engineering skills high, companies need to focus on retaining high-performing data engineers by providing career growth opportunities and paths to advancement.

A clear career ladder with multiple levels - e.g. associate data engineer, data engineer, senior data engineer, staff data engineer - helps engineers understand what is expected to progress to the next level. You should also evaluate when it’s appropriate to split ladders when different roles within the data team begin to crystallize.

For example, a data engineer who is spending a lot of their time debugging data issues and working on reliability may not be “performing” at a promotion-worthy level according to the data engineering ladder rubric, but that doesn’t mean that their work isn’t valuable. What you should do instead is to create a separate ladder/role, called data reliability engineer.

Balancing Technical and Leadership Responsibilities

As data leaders progress in their careers, their responsibilities shift from being primarily technical to requiring more "soft skills" like communication, mentorship, and strategic thinking. However, it is still important for leaders to maintain a level of technical fluency. They can do this in a few different ways. 

The most crucial is to deliberately allocate time for learning new tools and techniques - reading new papers, playing around with new tools and open-source libraries, attending meetups and conferences, and talking to engineers with their ears on the ground.

Communities with other technical leaders will also be a valuable source of information since data teams often end up solving the same problem over and over again. We really love the dbt community and Data Quality Camp.

Developing a Data Strategy for Your Enterprise

Creating a Data Roadmap

An impactful data roadmap starts with listening to business leaders to understand top objectives around revenue growth, risk reduction, and operational efficiency. 

After grasping priorities, you should then assess how data is currently used across the organization - identifying silos, culture challenges, and workflow pain points. With these business goals and data use challenges in mind, you should outline a strategy focused on people, processes, and technology rather than jumping to tech-only solutions.

For example, after your initial feedback sessions, you might hear complaints like the following:

  • It’s hard to get basic usage data for our customers and for analytics purposes
  • The finance team always has to ask data engineers to run queries for them
  • We are losing millions of dollars on fraud
  • Data is always late and we don’t trust it.

When writing the data roadmap, the goal is to address as many of these pain points as possible in a generalizable way that still drives toward a long-term vision for data at the organization. For the example issues raised:

  • Pulling data out of S3/Kinesis and into a cloud data warehouse
  • Training business users on SQL, and building BI dashboards
  • Building a machine learning model to detect fraud
  • Implement data checks/data observability

The roadmap should focus on high-priority pain points that align with overall goals around revenue, risk, and efficiency. The aim is to improve access, culture, and workflows in an extensible way that delivers tangible business value.

Scaling Data Infrastructure for Growth

In a rapidly growing scale-up environment, data infrastructure must quickly adapt to increasing demands. While we will not be able to cover everything you need to know about scaling data infrastructure within this blog post, there are some best practices you should keep in mind, to ensure that your data systems don’t immediately become tech debt:

  • Implement automated and modular pipelines - Automated pipelines (ELT, data quality monitoring) allow for scalability by removing manual steps. Modular design splits pipelines into logical stages, enabling parallelization.
  • Separate storage from compute - Decouple data storage (like data lakes) from processing. This allows you to scale each independently. Fortunately, most modern data warehouses do this by default.
  • Leverage managed services - Cloud-native platforms like Snowflake remove infrastructure management burden. But make sure you have an exit plan too - often after a certain size or data volume, it no longer makes sense to pay for the managed plan, and you end up having to migrate to something built in-house.
  • Establish capacity monitoring - Continually monitor usage trends to predict growth and proactively scale.
  • Establish data quality monitoring - as a company grows, you will inevitably run into challenges around data quality; the earlier you can establish even some basic tooling around monitoring both the data pipelines and the table values themselves, the more those benefits will compound.

Evaluating the Best Data Technologies

When evaluating new data technologies, data leaders must first define selection criteria that align with long-term architectural principles and data strategy – key elements include scalability, flexibility, time-to-value, skills-fit, and total cost of ownership.

With criteria defined, you can solicit community input on the best tools, and vet options. Luckily, there’s increasingly a trend towards self-service and freemium software in data, which allows you to play around with many data infrastructure tools for free or very low cost. This isn’t to say that the tool will stay free - it’s important to analyze total cost, factoring in licensing, infrastructure needs, and development/maintenance efforts.

Finally, you will inevitably face build vs. buy decisions. Building tools in-house offers customization but requires significant resources. Buying may enable faster time-to-value but can lead to vendor lock-in. Open architectures that prevent lock-in are probably ideal. The optimal decision balances business needs, costs, and technical capabilities.

Balancing Existing Tools

At established companies, there’s also the question of whether the new tool will play nicely with all your existing tools, and whether the new tool offers sufficient additional value to justify its integration cost.

Questions that you should ask include:

  • What are we currently using to solve the problem? How painful is it? How does the new tool improve on that experience?
  • Does the new tool integrate smoothly with our existing data stack?
  • Will it require a significant change in our data workflows or can it be incorporated with minimal disruption?
  • Does the tool require training and onboarding (i.e. obscure DSL) or do our engineers already have the skills to use it?
  • What are the long-term benefits versus the costs of integration?

Looking Ahead

Data leaders shape their organizations' data future, envisioning how data can drive business value over 3-5 years. They are responsible for creating scalable, flexible data platforms, tools, and organizations, and advocating for data's strategic importance at the executive level.

To fulfill this role, data leaders have quite the balancing act - on the one hand, they should drive innovative solutions; on the other, they should also be institutionalizing best practices around data governance, access, and culture.

Perhaps most challengingly, data leaders have to wrangle people - assembling and nurturing skilled, collaborative teams.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.