For every star on GitHub, we'll donate $2 to clean up our waterways. Star us now!
We classify three types of production issues:
outages
- An outage is any user-impacting disruption of service.bugs
- Bugs are not always outages. A bug generally affects a single version of the codebase.
priority::high
, priority::higher
, or priority::highest
.Examples of outages include:
meltano.com
website is down.hub.meltano.com
website is down.discovery.yml
web endpoint is down.pipx install meltano
is failing for any reason - including upstream package dependency breakages, PyPI outages, etc.urgency::highest
)
Examples of critical bugs include:
Bugs labeled priority::highest
should be alerted ASAP, and should be resolved within 24 hours or sooner. By approval from a Staff Engineer or higher, the problem version may be optionally yanked from PyPI.
Always tag AJ, Taylor, and Florian when a critical bug is identified.
The #meltano-alerts
Slack channel receives alerts for outages and high-priority bugs.
The #troubleshooting channel is the primary place we notify users of outages and critical bugs. Depending on severity and percentage of users impacted, we may also notify users in the #announcements channel.
If you are responding to an alert in #meltano-alerts
:
If you have identified a production outage or a critical bug and no alert is yet logged to #meltano-alerts
:
#meltano-alerts
.When outages are expected to impact users, please share the alert or create a new notification in the #troubleshooting
channel. Users would otherwise inquire in #troubleshooting
should discover your notification and know that the Meltano team is addressing the issue.
Occasionally we observe outages due to upstream services failures.
If the issue requires action from us or is otherwise worthy of investigation, we should log an issue for tracking our work and then proceed with the alerting process.
If the issue does not require any action from us, such as a significant PyPI or GitLab service outage, we may not need to open an issue but we should nevertheless notify users as appropriate.