© Jonathan Nackstrand, AFP
In 2021, companies worldwide spent $39.2 billion on cloud databases. But despite the growth, most organizations don’t have the majority of their corporate data in a modern data cloud warehouse.
Monte Carlo’s survey of more than 200 data professionals at Snowflake Summit 2022 found that less than 25 percent of business data resides in a cloud system. For proponents of cloud solutions, this means that companies are not realizing the flexibility, scalability and security advantages that cloud computing can offer.
Making changes means challenging those who prefer local data storage. For example, at Equifax, buggy code on a legacy server misjudged the creditworthiness of millions of customers.
One of the proponents of the cloud is Lior Gavish, co-founder and CTO of Monte Carlo. Gavish looks at the key trends likely to impact the broader field of data engineering and analytics in 2023.
data contracts
According to Gavish, data contracts are designed to prevent data quality issues that arise upstream when data generation services change unexpectedly. As he notes: “Data contracts are very important in fashion. Why? Thanks to changes made by software engineers unknowingly branching out through updates that affect the downstream data pipeline, and the advent of data modeling, data engineers have the ability to deliver the data to the warehouse pre-modeled. 2023 will see broader adoption of data contracts as practitioners try to apply these frameworks.”
Monetization of data
This area is driven by economic pressures. Gavish notes, “During tough times, data teams are under more pressure than ever to align their efforts with the bottom line. Data monetization is a mechanism for data teams to tie directly to revenue. It also allows data insights and reports to be added to products, a differentiator in an increasingly competitive marketplace.”
infrastructure as code
A more recent service approach focuses on infrastructure. Gavish explains, “Modern data operations require hyper-scalable cloud infrastructures, but constantly deploying and maintaining these services can be tedious and time-consuming. Infrastructure as code enables data engineers to create a more seamless data pipeline infrastructure that’s easier to provision, deprovision, and change—critical when budgets are tight and staff numbers are limited.”
Data reliability engineering
Data must be reliable to meet business needs and keep customers on-site. Gavish says, “All too often, bad data is first discovered by downstream stakeholders in dashboards and reports, rather than in the pipeline—or even before it. Because data is seldom in its ideal, totally reliable state, data teams hire data reliability engineers to set up the tools (like data observation platforms and data testing) and processes (like CI/CD) to ensure that when problems arise, they are resolved quickly and the impact is felt are shared with those who need to know before your CFO finds out.”