Opinion

How to set expectations around data quality and reliability for your company

Image for post
Image for post
Image courtesy of Yevgenij_D on Shutterstock, available for use by author with Standard License.

For today’s data engineering teams, the demand for real-time, accurate data has never been higher, yet broken pipelines and stale dashboards are an all-too-common reality. So, how can we break this vicious cycle and achieve reliable data?

Just like our software engineering counterparts 20 years ago, data teams in the early 2020s are facing a significant challenge: reliability.

Companies are ingesting more and more operational and third-party data than ever before. Employees from across the business are interacting with data at all stages of its lifecycle, including those on non-data teams. …


Opinion

Why we need to rethink our approach to metadata management and data governance

Image for post
Image for post
Image courtesy of Andrey_Kuzmin on Shutterstock

As companies increasingly leverage data to power digital products, drive decision making, and fuel innovation, understanding the health and reliability of these most critical assets is fundamental. For decades, organizations have relied on data catalogs to power data governance. But is that enough?

Debashis Saha, VP of Engineering at AppZen, formerly at eBay and Intuit, and Barr Moses, CEO and Co-founder of Monte Carlo, discuss why data catalogs aren’t meeting the needs of the modern data stack, and how a new approach — data discovery — is needed to better facilitate metadata management and data reliability.

It’s no secret: knowing…


Image for post
Image for post
Image courtesy of Monte Carlo

Monte Carlo, the data observability company, today announced the launch of the Monte Carlo Data Observability Platform, the first end-to-end solution to prevent broken data pipelines. Monte Carlo’s solution delivers the power of data observability, giving data engineering and analytics teams the ability to solve the costly problem of data downtime.

As businesses increasingly rely on data to drive better decision making and maintain their competitive edge, it’s mission-critical that this data is accurate and trustworthy. Today, companies spend upwards of $15 million annually tackling data downtime, in other words, periods of time where data is missing, broken, or otherwise…


Introducing a new approach to understanding the reliability of your data

Image for post
Image for post
Image courtesy of FoxyImage on Shutterstock, approved for use through Shutterstock’s Standard License.

Companies spend upwards of $15 million annually tackling data downtime, in other words, periods of time where data is missing, broken, or otherwise erroneous, and over 88 percent of U.S. businesses have lost money as a result of data quality issues.

Fortunately, there’s hope in the next frontier of data engineering: data observability. Here’s how the data engineering team at Blinkist, a book-summarizing subscription service, increases cost savings, collaboration, and productivity with data observability at scale.

With over 16 million users worldwide, Blinkist helps time-strapped readers fit learning into their lives through their ebook subscription service.

Gopi Krishnamurthy, Director of…


A primer for data teams on the latest industry trend: the data mesh

Image for post
Image for post
Image courtesy of Nixx Photography on Shutterstock, available for use with Standard License purchased by author.

Your company wants to build a data mesh. Great! Now what? Here’s a quick primer to get you started — and prevent your data infrastructure from turning into a hot mesh.

Since the early 2010s, microservice architectures have been adopted by companies far and wide (think: Uber, Netflix, and Airbnb, among others) as the software paradigm du jour, sparking discussion among engineering teams about the pros and cons of domain-oriented design.

Now, in 2021, you’d be hard-pressed to find a data engineer whose team isn’t debating whether or not to migrate from a monolithic architecture to a decentralized data mesh.


Tutorial

Using schema and lineage to understand the root cause of your data anomalies

Image for post
Image for post
Image courtesy of Lucas Pezeta on Pexels.

In this article series, we walk through how you can create your own data observability monitors from scratch, mapping to five key pillars of data health. Part I can be found here.

Part II of this series was adapted from Barr Moses and Ryan Kearns’ O’Reilly training, Managing Data Downtime: Applying Observability to Your Data Pipelines, the industry’s first-ever course on data observability. The associated exercises are available here, and the adapted code shown in this article is available here.

As the world’s appetite for data increases, robust data pipelines are all the more imperative. When data breaks — whether…


Meet the data leaders charting a path forward for reliable data in the New Year

Image for post
Image for post
Image courtesy of SanaStock on Shutterstock.

It’s no secret: data is your company’s most valuable asset.

Your Marketing Analytics team uses data to inform their email campaigns; your product managers leverage insights about user behavior to prioritize the development of new features; and even your Operations team relies on data to develop growth strategies.

Unfortunately, most companies fail to realize its full potential due to an all-too-common reality for most data teams: data downtime. Data downtime, in other words, periods of time when data is inaccurate, unreliable, or otherwise erroneous, spares no one. It manifests in broken pipelines, stale dashboards, and…


How we’re charting a new path forward for data trust and reliability

Image for post
Image for post
Image courtesy of Monte Carlo.

In 2021, data is your company’s most critical asset.

As data pipelines become increasingly complex and companies ingest more and more data, it’s paramount that this data is reliable. After talking to hundreds of data teams over the past few years, I was struck by the fact that organizations were investing millions of dollars and strategic energy in data, but decision makers and others on the frontlines couldn’t use it or didn’t trust it. There had to be a better way.

In 2019, we founded Monte Carlo to…


Opinion

And how to achieve them.

Image for post
Image for post
Article courtesy of Anastasia Petrova on Unsplash.

In 2021, it’s not just about having the “modern data stack.” It’s about having a modern approach to working with your data. Here’s how we get there.

Over the past few weeks, I’ve been having lots of conversations with some of the world’s best data teams about their 2021 priorities. With many teams are focused on upgrading or scaling out existing infrastructure, two “resolutions” have really stuck out to me:

  1. Bringing engineering and data organizations closer together
  2. Directly connecting data producers with data consumers

Unlike so many others, these two priorities are decidedly not technical, speaking to the need not…


Tutorial

How to build your own data quality monitors to identify freshness and distribution anomalies in your data pipelines

By Ryan Kearns and Barr Moses

Image for post
Image for post
Image courtesy of faaiq ackmerd on Pexels.

In this article series, we walk through how you can create your own data observability monitors from scratch, mapping to five key pillars of data health. Part 1 of this series was adapted from Barr Moses and Ryan Kearns’ O’Reilly training, Managing Data Downtime: Applying Observability to Your Data Pipelines, the industry’s first-ever course on data observability. The associated exercises are available here, and the adapted code shown in this article is available here.

From null values and duplicate rows, to modeling errors and schema changes, data can break for many reasons. Data testing

Barr Moses

Co-Founder and CEO, Monte Carlo (www.montecarlodata.com). @BM_DataDowntime #datadowntime

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store