Build platform connectivity issues

Incident Report for Semaphore CI

Resolved

Semaphore Platform's upstream provider has recovered from partial network issues caused by BGP hijacking in an outer Telia network. If you notice any intermittent network issues, please contact us on support.

We will post a detailed post-mortem on https://semaphoreci.com/blog soon, and will reach out to all customers who have submitted support requests related to this issue.

Once again we'd like to apologize for the inconvenience this may have caused you, and thank you for your patience. Here's to a better week ahead.

Posted Dec 18, 2017 - 00:44 UTC

Update

Our upstream provided has successfully rerouted all communication to AWS and other affected services. We continuously monitor these channels. We will share more details about the cause soon. If you notice any network issues, particularly related to SSL, please reach out to us on support.

In the meantime we are rolling out a new build cluster based in US as backup.

We understand how difficult these past few days have been for customers who experienced these issues and are thankful for the support and patience that you've shown.

Posted Dec 17, 2017 - 21:16 UTC

Update

We've added a new command to our build platform that allows you to retry commands on Semaphore. The `retry` command helps avoid build failures caused by the intermittent network failures. Read more http://semaphoreci.com/docs/customizing-build-commands.html#retrying-commands.

Posted Dec 16, 2017 - 19:34 UTC

Update

We continue to monitor network connectivity. We are working with GitHub on rerouting traffic from Telia, an intermediary network provider.

Posted Dec 16, 2017 - 11:40 UTC

Monitoring

We've rerouted traffic to Docker Hub which should eliminate network errors, including TLS handshake errors. We are continuing to monitor the situation.

If you notice network errors in communication with this or any other hosts, please reach out to us on support with domain and IP address details.

Posted Dec 15, 2017 - 18:51 UTC

Identified

Our upstream provider has rerouted all traffic to github.com, which should eliminate all errors in communication with GitHub. We continue to work on solutions for other services, particularly Docker registries.

Feel free to reach out to our support if you'd like to share with us some specifics on how these issues are affecting your team. Thank you for bearing with us.

Posted Dec 15, 2017 - 16:20 UTC

Update

We continue to work on rerouting outgoing traffic from the build cluster.

Posted Dec 15, 2017 - 12:24 UTC

Update

We've identified that source of network problems is outside of our upstream provider's infrastructure. We're working on rerouting external traffic from the build cluster to avoid problematic routes.

Posted Dec 15, 2017 - 00:28 UTC

Update

We are working to isolate parts of the network in build cluster that are causing sporadic packet loss. We will continue to keep you posted on our progress.

Posted Dec 14, 2017 - 18:09 UTC

Update

We continue to investigate the root cause issue of network instabilities in the build cluster. As a workaround we have increased the default retry count for most used package managers, RubyGems and NPM.

Posted Dec 14, 2017 - 13:25 UTC

Update

We are still working with our upstream provider to resolve networking issues. A small percentage of network communication remains affected. We are closer to finding the root cause after rerouting all traffic to avoid Telia Transit. We apologize for any issues your team is having, and we assure you that we're doing all we can to resolve this issue as quickly as possible.

Posted Dec 14, 2017 - 10:26 UTC

Investigating

We're seeing occasional packet loss in communication between Semaphore platform and some remote hosts. We're monitoring the situation & working with our upstream provider to determine if we can do anything to influence it.

Posted Dec 13, 2017 - 13:06 UTC