Alright, so first of all, the colors don't matter, they're just for contrast/legend purposes. Flame graphs come in all sorts of color schemes, not even necessarily yellow/orange/red.
I tend to look at flame graphs in terms of % of the overall process. They're good for finding that one part of a routine that is taking up a decent % of processing, and if that part of the routine is being hung up on some mundane task.
For example, if I see 3/4 stacks directly on top of each other, then I know I've got a call stack a few levels deep that is overwhelmingly waiting on some low level thing to finish. If it's something that should be really fast (like a cache lookup), then I know something really stupid is happening.
Some flame graphs will tie in e.g. network requests and DB queries as their own traces, which will also give you a clue sometimes. Like, oh, this function is waiting 10s for a query to complete? Let's see what that is actually doing, maybe we can speed it up.
I used flame graphs (among other things) this year to take a 30 minute long payroll process down to about 3 minutes. Much of this was just scanning the flame graph for things high in the call stack that were taking up noticeable % of the processing time. This is easier if you know the codebase, but for example, I could see within the "load a bunch of data" phase of our processing that there were a few tax-related things taking up most of the overall time. We managed to trace those to a few calls to a third party library that we couldn't make any faster, but we could cache the results to mitigate the issue.
Another place we found expensive queries being repeated in different functions, which was obvious because we had 2 calls to the same function from both places. We ended up just raising those shared calls up a level and patching the data directly into the two functions that needed it.
Other places were less obvious. We could see a lot of time being spent, but we couldn't tell from the flame graph what was happening. We'd go look at some code and find some n^2 aggregation function that we'd need to simplify.
Overall, flame graphs are just one tool. They might not even be the best tool. In our case (heavy data driven web application) I would place DB observability at least as high in importance as good tracing, and flame graphs are just one way of visualizing traces.
I tend to look at flame graphs in terms of % of the overall process. They're good for finding that one part of a routine that is taking up a decent % of processing, and if that part of the routine is being hung up on some mundane task.
For example, if I see 3/4 stacks directly on top of each other, then I know I've got a call stack a few levels deep that is overwhelmingly waiting on some low level thing to finish. If it's something that should be really fast (like a cache lookup), then I know something really stupid is happening.
Some flame graphs will tie in e.g. network requests and DB queries as their own traces, which will also give you a clue sometimes. Like, oh, this function is waiting 10s for a query to complete? Let's see what that is actually doing, maybe we can speed it up.
I used flame graphs (among other things) this year to take a 30 minute long payroll process down to about 3 minutes. Much of this was just scanning the flame graph for things high in the call stack that were taking up noticeable % of the processing time. This is easier if you know the codebase, but for example, I could see within the "load a bunch of data" phase of our processing that there were a few tax-related things taking up most of the overall time. We managed to trace those to a few calls to a third party library that we couldn't make any faster, but we could cache the results to mitigate the issue.
Another place we found expensive queries being repeated in different functions, which was obvious because we had 2 calls to the same function from both places. We ended up just raising those shared calls up a level and patching the data directly into the two functions that needed it.
Other places were less obvious. We could see a lot of time being spent, but we couldn't tell from the flame graph what was happening. We'd go look at some code and find some n^2 aggregation function that we'd need to simplify.
Overall, flame graphs are just one tool. They might not even be the best tool. In our case (heavy data driven web application) I would place DB observability at least as high in importance as good tracing, and flame graphs are just one way of visualizing traces.