Service Maps are a lie
Service Maps are a common feature of APM solutions today. They are marketed as tools to help you understand the communication graph of your services. However, the reality is that they simply lack the context to give you accurate information and can even lead you to misunderstand the flow of requests in your system.
A simple example
Here is a graph of flights from between 3 cities. This graph says there are 200 passengers to NewYork from San Francisco. It doesnt say how many of those 200 passengers are coming via Seattle and just have San Francisco as their connecting flight. Looking at this graph, we may also think that everyone from Seattle is only going to San Francisco when in reality some passengers may be traveling to New York but using San Francisco as their connecting flight.

We have connected 2 disjointed pieces of information, flights leaving San Francisco and flights leaving Seattle, without any context. Adding that context in the diagram below things are much more clearer.

We can see now that of the 100 Seattle passengers to San Francisco, 80 have a connecting flight to NewYork. We now know exactly how many passengers are coming to NewYork from each city.
Here is a Service Map from Datadog.

- How many calls to
postgresql
are made with requests that came only fromdu-router
? - How do the calls change over time?
- Are there direct requests to
du-coord
made, that do not come via another service, but that result in calls topostgresql
? - Is
du-coord
running a periodic job that makes calls topostgresql
ordu-indexer
? - Do incoming requests from
du-indexer
result in separate calls being made back todu-indexer
(resulting in a cycle) or are calls todu-indexer
fromdu-coord
only made on incoming requests viadu-router
?
The Service Map has simply connected two disjointed pieces of information:
- There are 2 services (
du-router
anddu-indexer
) that talk todu-coord
. du-coord
talks to 2 different services (postgresql
anddu-indexer
).
The map is showing correlation when we are looking for causation.
The Service Map simply doesnt have enough information for one to understand and debug the flow of requests through your system. It is also a point-in-time graph only which doesnt show you how the communication between entities changed over time.