Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specs - profiling integration: Make host.id in registration message o… #900

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

florianl
Copy link
Member

@florianl florianl commented Dec 12, 2024

…ptional

host.id is a not well and uniquely defined attribute, see open-telemetry/semantic-conventions#581 for example. In particular on containerized environments profiling agents do see a different host.id than APM-agents, which makes it harder to correlate information. To being able to correlate profiling and APM information, container.id was identified to fit the use case best. As profiling as well as APM agents already collect and send out container.id with their respective data. For non containerized environment host.id still can be used and in such a use cases profiling agents and APM-agents should have the same understanding of host.id.

For backwards compatibility reasons just make the argument for host-id in the registration message optional.

  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Merge after 2 business days passed without objections
    To auto-merge the PR, add /schedule YYYY-MM-DD to the PR description.

…ptional

`host.id` is a not well and uniquely defined attribute, see open-telemetry/semantic-conventions#581 for example. In particular on containerized environments profiling agents do see a different `host.id` than APM-agents, which makes it harder to correlate information.
To being able to correlate profiling and APM information, `container.id` was identified to fit the use case best. As profiling as well as APM agents already collect and send out `container.id` with their respective data. For non containerized environment `host.id` still can be used and in such a use cases profiling agents and APM-agents should have the same understanding of `host.id`.

For backwards compatibility reasons just make the argument for `host-id` in the registration message optional.
* *samples-delay-ms*: A sane upper bound of the usual time taken in milliseconds by the profiling host agent between the collection of a stacktrace and it being written to the apm-agent via the [messaging socket](#cpu-profiler-trace-correlation-message). The APM-agent will assume that all profiling data related to a span has been written to the socket if a span ended at least the provided duration ago. Note that this value doesn't need to be a hard a guarantee, but it should be the 99% case so that profiling data isn't distorted in the expected case.
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) used for the profiling data by this profiling host agent. If an APM-agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the host agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving it from the profiler host agent to ensure aforementioned correlation features work correctly.
* *samples-delay-ms*: A sane upper bound of the usual time taken in milliseconds by the profiling agent between the collection of a stacktrace and it being written to the apm-agent via the [messaging socket](#cpu-profiler-trace-correlation-message). The APM-agent will assume that all profiling data related to a span has been written to the socket if a span ended at least the provided duration ago. Note that this value doesn't need to be a hard a guarantee, but it should be the 99% case so that profiling data isn't distorted in the expected case.
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) is an optional argument used to correlate profiling data by the profiling agent. If an APM-agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the profiling agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an APM-agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving it from the profiling agent to ensure aforementioned correlation features work correctly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When host.id is optional and no longer used for correlation, is logging a message still required (MUST)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the porposed optional host.id is sent by the profiling agent and APM agent is already sending a host.id and both host.IDs are different, then yes APM agent MUST log a warning. Event if the argument is optional, a collision of host.id values should be detected and trigger a warning.

specs/agents/universal-profiling-integration.md Outdated Show resolved Hide resolved
specs/agents/universal-profiling-integration.md Outdated Show resolved Hide resolved
@florianl florianl marked this pull request as ready for review December 12, 2024 10:28
@florianl florianl requested review from a team as code owners December 12, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants