With the holiday shopping season kicking into high-gear, are you ready to handle the masses? (Originally Published November 21, 2017)

Last week’s Black Friday marked the beginning of the holiday shopping season, arguably one of the busiest times of the year for those in the e-commerce industry. As consumers ramp up their seasonal spending, retailers, credit card companies and digital payment services alike are under immense pressure to deliver a seamless buying experience.

Is your application ready to handle the holiday surge? Below are a few tips and tools to keep in mind in anticipation of the busiest shopping season of the year:

1. Coordinate Across Teams

Is your engineering team aligned with product on any holiday season releases? Likewise, is engineering aligned with the various lines of business? 

It’s critical to not only ensure your own team is in sync to perform fast incident response in the event of an outage, but also to be in lock-step with what marketing, sales and other business units are planning for the season. Understanding promotional plans and other seasonal activities can help you better anticipate traffic spikes and time new deployments accordingly.

2. Make Sure Your Alerts Are Meaningful

The only thing worse than an application outage during peak season is hearing about that outage from your customers. In order to avoid this, you need your monitoring and alerting ecosystem firing on all cylinders. 

There are tens, thousands or even millions of things happening within your application at any given time, and you want to keep track of everything that’s going on. But more importantly, you need to know as soon as something goes wrong – ideally, before it affects your users and customers.

Ask yourself the following questions about your monitoring toolchain:

  1. Do we know which issues are critical (i.e. which errors or slowdowns have high potential for customer impact)?
  2. Do we have real-time alerts for those issues?

There are endless options for what each alert can hold, but adding too many indicators can turn critical alerts into noise you’ll have to sift through. OverOps helps organizations identify critical issues in real-time to ensure your customers have a flawless shopping experience.

DevOps engineers and SREs get notified on critical issues that otherwise would be missed, including things like new and increasing errors, and developers get an immediate alert if their code is affected – including a snapshot with the complete context of the error. Without OverOps, critical errors like new uncaught exceptions or increasing NullPointer Exceptions would impact customers.

3. Adopt a Shift Left Approach to Quality

There’s always at least one sneaky bug that makes its way into production without you noticing – until it’s too late, and customers are already complaining about it. That said, there’s no time like the present to think about code quality early and often in your SDLC. 

Revisit your pre-production workflow to ensure you are optimizing your CI/CD pipeline for code stability. The better your tests and QA processes are, the less likely you are to experience a major incident that impacts holiday shoppers. This means employing a multitude of testing and QA practices, as well as leveraging automation to detect and block unstable releases before they get promoted.

Current testing methods can only account for the code paths we’re able to foresee, meaning that even with 100% code coverage, critical errors can still slip through the cracks. OverOps augments existing tests and helps identify these missed errors by analyzing code at runtime and acting as a quality gate for preventing bad code from moving into production. 

When a critical issue is identified, OverOps plugins for popular CI/CD tools like Jenkins and TeamCity automatically block risky builds and route the issue(s) back to the relevant developer for fast resolution. This ensures your holiday shoppers never experience a disruption due to an unstable deployment.

4. Streamline Your Troubleshooting Process

If an application error crashes your shopping cart service on one of the busiest shopping days of the year, you don’t want to waste any time getting that issue resolved. Every additional minute wasted on debugging in these pivotal moments can kill your business. 

Once an issue has happened, you need to know why it happened and how to fix it, quickly. Traditional monitoring tools, such as APMs and log analyzers, rely heavily on foresight and manual practices and are often unable to provide the detailed context needed to troubleshoot an error fast.

OverOps helps organizations resolve issues quickly by capturing the True Root Cause of critical errors and exceptions – even those missed by log management and APM tools. With our JIRA integration, developers automatically receive the complete context needed to reproduce and fix issues, including the complete source code, variables, DEBUG logs and environment state behind any error or slowdown.

Final thoughts

In the end, it doesn’t matter what time of year it is – you want your application to be ready for any scenario, and you never want reliability concerns to derail your release schedule. By incorporating the above practices into your workflow, you can make sure your application will always be accessible, without experiencing any downtime, all year long.

Nicole is a communications and product marketing manager at OverOps. Her expertise includes technologies ranging from artificial intelligence and predictive analysis to DevOps, incident management and more.