Streamlined data reliability engineering notifications with Bigeye
Introducing many new Slack-related features! In this post, we'll walk through them.
We’re excited to announce improvements to notifications in Bigeye. Data engineers spent large amounts of time in Slack, MS Teams, PagerDuty, and email. It's important to get realtime notifications from Bigeye in these tools so that the engineers can collaborate. With our latest release, notifications are now comprehensive. Data engineers get status updates regardless of whether the actions happen in Slack, Bigeye, or via the Bigeye API. Notifications are now consolidated into threads, making it easier and more convenient to collaborate with teammates when debugging specific issues. Slack notifications are now also interactive, allowing data engineers to triage, track, and resolve issues directly within Slack. Finally, we’ve streamlined notifications, ensuring every alert is relevant and actionable.
As an issue goes through its lifecycle, Slack notifications about an issue are sent to the same Slack thread. The thread’s root message keeps in sync with the issue’s status to summarize its current state. This continuity allows Slack users to quickly scan through a channel’s timeline and get an idea of the status of the issues in it.
Actions that cause transitions will show up in the notification stream, regardless if the action is taken in Slack, in the Bigeye app, or via the Bigeye API. You'll get the context and track the lifecycle of most issues completely from within Slack, even if the folks on your team like to access Bigeye via different mechanisms.
Since notifications are threaded, you now have a place where you and your teammates can collaborate on the issue, inline with the updates on the issue. You can use core Slack functionality such as @’ing folks to bring other teammates that may have expertise in the area. Any correspondence can now be in one place and can serve as the basis for a retrospective or run book creation after a data reliability incident is resolved.
Now that you are using Slack to get updates and collaborating with teammates you’ll likely reach a conclusion on an issue. You can now take action on a metric directly from slack. This means you can mute items, close issues, and give feedback on whether or not thresholds need to be updated.
As these actions are taken, the state in the app as well as in the Slack threads are updated. Everyone can track the state of their data pipelines regardless if they are in slack or viewing the Issue in the Bigeye app.
We know the value of a data engineer’s time, and work hard to ensure every notification is worthy of their attention. The latest update streamlines notifications so they only occur when data changes cause a metric to alert, when an alerting metric becomes healthy, when the state of an issue changes, or when there are configuration changes that affect an issue. Thus getting a notification means something has changed and is more likely to be actionable. This helps ensure alerts are actionable and reduces unnecessary alert noise.
The updates to the notification experience improves the actionability and meaningfulness of notifications. It enables data reliability engineers (DREs) to interact with notifications and react to issues in their infrastructure at a computer, or on their devices like their tablets or phones. By being able to tackle more on the first pass, subsequent triage and reviews on data quality issues are sped up and streamlined.
Schema change detection