Oddly enough I've generated similar graphs for stack tracing and profiling in the far far distant past and had not realised they'd been named and formalised.

For anyone else who has not recognised the name, I found these origin links which may interest some:

Abstract [2]:
Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version?
Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis.
They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.

This paper appears to be formalising the nature of a flame graph, recognising that they rest on data from sampling profilers and as such suffer a "never the same twice" variation when repeatedly run, and thus seeking to quantify "distance" of one graph from another as well as "average" of multiple graphs (for the "one true mean reference graph" I guess).

The motivation is then to be able to reliably measure "distance" from one group of profiling runs to another, before and after changes of interest.

I dare say much more is possible, but I fear I've barely skimmed the material before bedtime :/

BlackFly 7 days ago [-]

I don't think this algebra is a full description of flame graphs but it feels like it captures part of what you want for the limited purpose of global performance analysis.

I feel like a flame graph cannot be an element of a vector space since vectors are commutative and the stacks of a flame graph are ordered. From a performance perspective, ignoring the ordering can simplify the algebra (apparently by turning it into a vector space) and will still correspond with your overall impression of "time spent in a frame".

Specifically,

|A|B |A|
|C|C |C|

is (1, C;A), (2, C;B), (1, A;C) but as (1, C;A) + (2, C;B) + (1, A;C) with vector addition we can rearrange this to (2, C;A) + (2, C;B)

or

|A |B |
|C |C |

which of course still captures the time spent in C;A and C;B, but it has lost some information. We think of it as a different flame graph. Of course, ignoring that information can be meaningful to a performance analysis.

Indeed the ordering can only be important to application semantics. This is clearly accentuated by the norm, which is even destroying the information of the basis elements.

m1el 7 days ago [-]

The information loss you're describing is going from Flame Charts to Flame Graphs.
Yes, the call ordering, and multiple calls are lost if you're generating a Flame Graph.

HelloNurse 6 days ago [-]

Following the terminology of the article, CA and AC are two different stacks (C calling A vs. A calling C) so they cannot be merged. Instead, the whole flame graphs form a vector space, with a dimension for each stack (with the provision that for many purposes only the positive cone makes sense). There is no meaningful operation between stacks.

TYMorningCoffee 6 days ago [-]

What ordering do flame graphs have? You can move the stacks around, provided that the methods earlier in the stack match, since the graph only shows proportion of that stack chain.

nirinor 6 days ago [-]

Very nice formalization.

One area for refinement: it considers two stacks either identical or unrelated. Consider that stack A;B is actually very close to A;B;C, the difference might be due to a sample time occurring just before or just after the call to C. OP considers them just as different as A;B and Z;W, therefore amplifying a measurement noise.

This suggests using a refined metric between stacks (e.g., an edit distance counting pushes and pops), and then we can use it in defining the metric between flamegraphs (e.g., an optimal transport metric [1], instead of the proposed L1).

Avoiding that noise amplification reduces the background noise level, therefore the cost of effective measurements. From another perspective, the current OP scheme creates an avoidable curse of dimensionality in the form of the Hotelling test's requirement that each sample has more measurements have more samples than distinct stack frames. So the same code split into more functions is harder to measure, and too-small samples are useless. I think neither of those is necessary if we take stack similarity into account.

As a developer who doesn't use mathematical notation very often, I've had some difficulty reading it.
From my understanding, the paper goes as follows:

2.1 Define FlameGraph

Frame = something that identifies function/location in the code
Stack = Vec<Frame>
FlameGraph = Map<Stack, Real>
FlameGraphPositive = Map<Stack, RealPositive>

2.2 Define FlameChart

FlameChart = Vec<{ start: Real, flame_graph: FlameGraphPositive }>
# such that the start values are strictly ordered
fn to_flamegraph(flame_chart: FlameChart) -> FlameGraphPositive {
flame_chart
.map(|{ start, flame_graph }| flame_graph)
.sum()
}

Personally, I think a better definition of FlameChart is:

FlameChart = Vec<{ end_time: RealPositive, stack: Stack }>
# such that sampling starts at 0 and Vec is strictly ordered by end_time
fn to_flamegraph(flame_chart: FlameChart) -> FlameGraphPositive {
let mut start = 0.0;
flame_chart.map(|{ end_time, stack }| {
let elapsed = end_time - start;
start = end_time;
FlameGraphPositive::from_stack(stack, elapsed)
}).sum()
}

3.1 Define diff

# a,b: FlameGraphPositive
fn diff(a, b) -> FlameGraph;
# such that a + diff = b
# diff: FlameGraph
# add, sub: FlameGraphPositive
fn to_positive(diff) -> { add, sub };
# such that diff = add - sub

Using Hotelling T^2 Test, the author defines a metric, which takes sampling variance into account, and allows to detect performance regression, even when the sampling profile is noisy. (you'd have to read the paper for the exact method)

I'm confused, what are the takeaways, and why are there 83 upvotes with no discussion?

vrnvu 6 days ago [-]

I upvoted it because the paper is simple and easy to follow. It defines some properties of flame graphs and makes us of them to make conclusions about performance regression, A/B testing and how to have better profiling results removing noise.

yccs27 6 days ago [-]

The paper looks interesting but it seems no one here understands it, so people upvote it in the hopes that someone knowledgeable will see it and summarize it in the comments.

KyleBerezin 6 days ago [-]

A link to a paper on the usefulness of a certain type of graph... Never change HN

For anyone else who has not recognised the name, I found these origin links which may interest some:

[1] https://www.brendangregg.com/flamegraphs.html[2] https://www.usenix.org/conference/atc17/program/presentation...

(hour long usenix talk of [2]) https://www.youtube.com/watch?v=D53T1Ejig1Q

This paper appears to be formalising the nature of a flame graph, recognising that they rest on data from

sampling profilersand as such suffer a "never the same twice" variation when repeatedly run, and thus seeking to quantify "distance" of one graph from another as well as "average" of multiple graphs (for the "one true mean reference graph" I guess).The motivation is then to be able to

reliablymeasure "distance" from one group of profiling runs to another, before and after changes of interest.I dare say much more is possible, but I fear I've barely skimmed the material before bedtime :/

I feel like a flame graph cannot be an element of a vector space since vectors are commutative and the stacks of a flame graph are ordered. From a performance perspective, ignoring the ordering can simplify the algebra (apparently by turning it into a vector space) and will still correspond with your overall impression of "time spent in a frame".

Specifically,

is (1, C;A), (2, C;B), (1, A;C) but as (1, C;A) + (2, C;B) + (1, A;C) with vector addition we can rearrange this to (2, C;A) + (2, C;B)or

which of course still captures the time spent in C;A and C;B, but it has lost some information. We think of it as a different flame graph. Of course, ignoring that information can be meaningful to a performance analysis.Indeed the ordering can only be important to application semantics. This is clearly accentuated by the norm, which is even destroying the information of the basis elements.

stacks(C calling A vs. A calling C) so they cannot be merged. Instead, the whole flame graphs form a vector space, with a dimension for each stack (with the provision that for many purposes only the positive cone makes sense). There is no meaningful operation between stacks.One area for refinement: it considers two stacks either identical or unrelated. Consider that stack A;B is actually very close to A;B;C, the difference might be due to a sample time occurring just before or just after the call to C. OP considers them just as different as A;B and Z;W, therefore amplifying a measurement noise.

This suggests using a refined metric between stacks (e.g., an edit distance counting pushes and pops), and then we can use it in defining the metric between flamegraphs (e.g., an optimal transport metric [1], instead of the proposed L1).

Avoiding that noise amplification reduces the background noise level, therefore the cost of effective measurements. From another perspective, the current OP scheme creates an avoidable curse of dimensionality in the form of the Hotelling test's requirement that each sample has more measurements have more samples than distinct stack frames. So the same code split into more functions is harder to measure, and too-small samples are useless. I think neither of those is necessary if we take stack similarity into account.

[1] https://en.wikipedia.org/wiki/Wasserstein_metric

2.1 Define FlameGraph

2.2 Define FlameChart Personally, I think a better definition of FlameChart is: 3.1 Define diff 3.2 Define a naive diff metric 3.3 Define a more sophisticated diff metricUsing Hotelling T^2 Test, the author defines a metric, which takes sampling variance into account, and allows to detect performance regression, even when the sampling profile is noisy. (you'd have to read the paper for the exact method)

The code for this test is here: https://github.com/P403n1x87/flamegraph-experiment/blob/04db... https://github.com/P403n1x87/flamegraph-experiment/blob/04db...