Editor’s Note: This post was originally published on May 5, 2016. It has since been updated to reflect advancements in the industry.
It seems like everyone is into microservices these days, and monolith architectures are slowly fading into obscurity.
Trends come and go, of course, and the attention they get is often exaggerated and doesn’t reflect what’s really going on. With Microservices, though, there seems to be more consensus that the trend is here to stay. It makes sense. Conceptually, Microservices extend the same principles that engineers have employed for decades.
In this post, we’ll examine the top challenges associated with building and deploying microservices and see how you can solve them.
The flip side of microservices
Separation of Concerns (SoC), a design principle stating that software should be built with distinct sections determined by “concern” or overall function, has been employed for more than 30 years to dictate how technology should be built. In monolithic applications, it is reflected in the separation of Presentation, Business and Data Layers in a typical 3-tier architecture.
Microservices take this concept and flip it on its head. They take the same application and separate it in such a way that the application’s singular code base can be broken up and deployed separately.
The benefits are huge but they come at a price, usually reflected in higher operations costs in terms of both time and money. Aside from the enormous upfront investment that comes with transitioning an existing application to containers, maintaining that application creates new challenges.
Despite all of the hype around troubleshooting challenges with microservices, the real challenge is understanding how the system really works.
Problem #1: As if monitoring a monolith wasn’t hard enough
While monolithic applications have their own challenges, the process for rolling back a “bad” release in a monolith is fairly straightforward. In a containerized application, things are much more complicated. Whether you’re gradually breaking down a monolithic app to microservices or building a new system from scratch, you now have more services to monitor. Each of these will likely:
- Use different technologies and/or languages
- Live on a different machine and/or container
- Have its own version control
With this, the system becomes highly fragmented and a stronger need arises for centralized monitoring and logging.
Components of a containerized application can be built and deployed independently from each other, so if something breaks after deployment, we first need to identify which services need to be rolled back (not always as easy as it sounds) and consider how rolling it back will impact the other services.
Takeaway #1: If you thought monitoring a monolith architecture was hard, it’s 10x harder with microservices and requires a bigger investment in proactive measures.
Problem #2: Logging is distributed between services
When talking about monitoring an application, one of the first things to come up is: logs, logs, logs. The IT equivalent of carbon emission, GBs of unstructured text are generated by servers on a daily basis, culminating in overflowed hard drives and crazy ingestion, storage and tooling costs.
Even with a monolith architecture, your logs are probably already causing your engineers some headaches. They’re notoriously shallow and rarely contain useful variable values or provide direct insight into Root Cause. Even in a monolith, the code path crosses multiple layers, so logs relating to the same issue may end up in different places.
With microservices, your logs become even more scattered. A simple user transaction can now pass through many services, all of which have their own logging framework. To troubleshoot an issue, you’ll have to pull out all the different logs from all the services that transaction could have passed through to understand what went wrong.
At OverOps, our team solves this by… using OverOps on OverOps. First, we don’t rely on logs to troubleshoot our application. The OverOps Platform identifies all errors and slowdowns even if they aren’t logged. When we do see an error in the logs, we inject a link into the log which leads to our True Root Cause screen.
There, we can see the full context of what caused the issue including which application, service or container was involved, the full stack trace and variable state at every frame, even if it’s distributed between a number of services or machines, plus system metrics and much more.
Takeaway #2: Microservices are all about breaking things down into individual components. As a side effect, Ops procedures and monitoring are also broken down by service and lose the ability to monitor the system as a whole. The challenge here is to re-centralize using proper tooling.
Problem #3: An issue that’s caused by one service, can cause trouble elsewhere
If you follow up on a broken transaction in a specific service, you can’t guarantee that the same service you’re looking at is to blame.
In reality, there are several possible scenarios to explain what’s going on:
– The input it received is bad, so you need to understand what made the previous service misbehave
– The output returned some unexpected response from the following service, in which case you need to understand how the next service behaves
– The likely scenario that dependencies are more complex than 1-to-1, and there’s more than one service contributing to the problem
Whatever the problem is, the first step with microservices is to understand where to start looking for answers. The data is scattered all over the place, and might not even be accessible from within your logs and dashboard metrics at all.
Takeaway #3: With monoliths you usually know that you’re at least looking in the right direction, microservices make it harder to understand what the source of the issue is and where you should get your data from.
Problem #4: Finding the root cause of problems
At this point, you’ve nailed down the problematic services, pulled out all the data there is to pull including stack traces and some variable values from the logs. You probably also have some kind of APM solution like New Relic, AppDynamics or Dynatrace (which we also wrote about here and here). From there, you’ll get some additional data about unusually high processing times for some of the related methods.
But… what about… the actual Root Cause of the issue?
The first few bits of variable data you (hopefully) get from the log most likely won’t be the ones that move the needle. They’re usually more like breadcrumbs leading in the direction of the next clue and not much further. At this point, we need to do what we can to uncover more of the “magic” under the hood of our application.
In desperate times, that might mean adding more logging to a particular service and redeploying, hoping the issue will resurface with more context. Ideally, we’d be able to avoid such a scenario by proactively setting up monitoring solutions that identify what our own foresight can’t.
Takeaway #4: When the root cause of an error in a microservice spans across multiple services, it’s critical to have a centralized root cause detection tool in place. OverOps helps teams be proactive by monitoring applications at all stages of the SDLC, identifying all issues (including uncaught exceptions) and providing full context for quick resolution time.
Problem #5: Version management and cyclic dependencies between services
Another issue, one that we brought up before but that we think is worth highlighting, is the transition from a layer model in the typical monolithic architecture to a graph model with microservices.
During the transition, there are two problems that can happen related to keeping your dependencies in check.
1. If you have a cycle of dependencies between your services, you’re vulnerable to distributed stack overflow errors when a certain transaction might be stuck in a loop.
2. If two services share a dependency and you update one’s API in a way that affects the other, then you’ll need to update both.
More services means different release cycles for each of them, adding to this complexity. Reproducing a problem can prove to be very difficult when it disappears in one version and reappears in a later one.
Takeaway #5: In a microservice architecture, you’re even more vulnerable to errors rooted in dependency issues.
As with most advancements in the tech industry, Microservices take a familiar concept and flip it on its head. They rethink the way that large-scale applications should be designed, built and maintained.
With them comes many benefits, but also new challenges. When we look at these 5 main challenges together, we can see that they all stem from the same idea. The real challenge that Microservices introduce is refamiliarizing ourselves with how our applications and systems really work.
The bottom line is that adopting a new technology like Microservices demands new core capabilities in order to be successful. OverOps empowers teams to overcome the challenges that Microservices present by accessing the data needed and filling the gaps that traditional tooling don’t cover. Watch a live demo here.