Thought leadership
-
January 1, 2024

Three Essential Trends For Data Leaders in 2024: LLMs, Verticalization, and Consolidation

As we step into 2024, the data and analytics landscape continues to evolve, presenting data leaders new opportunities as well as new challenges.

Kyle Kirwan

As we step into 2024, the data and analytics landscape continues to evolve, presenting data leaders new opportunities as well as new challenges.

Three pivotal trends are poised to significantly impact the industry this upcoming year. Let's delve deeper into these trends and explore the implications for data teams.

RAG-Based LLMs

Large Language Models (LLMs) took center stage in 2023, transforming how data is processed and analyzed on a global scale. However, alongside their rapid adoption came some notable challenges. While casual users of generative AI apps like ChatGPT might not see the extent of these issues, when it comes to enterprise data applications, the effects of hallucinations and training period limitations can wreak havoc.

Enter Retrieval-augmented generation (RAG) models, a promising solution set to address these issues and potentially revolutionize data accessibility within enterprises.

RAG models offer a solution to combat the challenges of 'hallucinations' by providing auditable and up-to-date information. These models enable access to external data stores, ensuring the information provided is not only reliable but also current. For data professionals, understanding and harnessing the potential of RAG-based LLMs is pivotal, as they could significantly enhance the reliability and relevance of insights derived from these models.

Actionable Advice: Embrace the adoption of RAG-based LLMs. Explore training in this area, and consider implementing these models for key data initiatives to improve response accuracy, reduce 'hallucinations,' and ensure the information provided is up-to-date and auditable.

Verticalization Within Data Infrastructure Providers

The trend towards verticalization within data infrastructure providers has been steadily gaining momentum, marked by significant acquisitions in recent years—Databricks buying Arcion and Mosaic earlier this year, Snowflake purchasing Neeva and Streamlit, DBT acquiring Transform, etc.

These acquisitions represent a move towards vertical integration, with large cloud providers aiming to offer comprehensive solutions within the data ecosystem. This not only redefines the market landscape but also presents new possibilities for integrating and utilizing these platforms effectively.

Actionable Advice: Monitor how these integrations might impact your data operations tools and processes. Assess how the changing landscape might offer new solutions or alter the functionalities of existing tools in your data stack. Explore native capabilities from your cloud provider and determine when you can consolidate your tooling and when it makes sense to seek out an independent, best-of-breed solution.

Consolidation Within Data Operations

The data operations sector is experiencing a parallel pattern, with startups introduced between 2020 and 2022 now reaching the end of their runways. This is especially evident through a string of acquisitions facilitated by major industry players. Notably, companies such as IBM, Teradata, Collibra, and Bigeye strategically acquired other firms including Manta, Stemma, OwlDQ, SQLDep, and Data Advantage Group.

This marks a self-correcting trajectory within the industry, where the influx of data tools that came onto the market several years ago, is consolidating into just a few key players.

Actionable Advice: Fully evaluate the financial health, reputation, and strategic alignment of potential vendors. Ensure that your chosen partners can maintain services, provide consistent support, and align with your organization's long-term goals. Carefully review existing integrations when considering a data operations vendor to ensure the solution will work seamlessly in your data stack.

As we enter 2024, these three trends—RAG-based LLMs, verticalization within data infrastructure, and the consolidation of data operations—present both opportunities and challenges for data and analytics professionals. Staying informed, adaptable, and ready to embrace these shifts will be the key to thriving in an ever-evolving data landscape.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.