This RFC proposes making namespace
a first-class field on the internal
Metric
type to allow it to be set in sources, manipulated in transforms, and
used by sinks.
This RFC will cover:
- Separating the
namespace
of metrics into a separate field onMetric
As we add metric
sources like
apache_metrics
and
postgresql_metrics
,
that set their own namespaces (defaulting to apache
and postgresql
), it is
becoming more clear that we may want to maintain the namespace
separate from
the metric name to allow for:
- simple manipulation in transforms (e.g. as a filter)
- to be sent as a separate field for sinks that require it (e.g.
aws_cloudwatch_metrics
) - to be formatted differently depending on sink convention (e.g. it looks like
NewRelic prefers
namespace.name
for metrics).
I believe the current implementation proposals for these metrics sources will
simply prefix the name as the prometheus
and statsd
sinks do, but this will
be difficult to use with the aws_cloudwatch_metrics
which requires the
namespace
as a separate field for the AWS API calls.
Additionally, I think separating it will allow it to be more useful in transforms (users could currently emulate this by prefix matches of the metric name).
Add namespace
to
Metric
:
pub struct Metric {
pub name: String,
pub namespace: Option<String>, // added
pub timestamp: Option<DateTime<Utc>>,
pub tags: Option<BTreeMap<String, String>>,
pub kind: MetricKind,
#[serde(flatten)]
pub value: MetricValue,
}
Metric sources can then optionally assign a namespace
for the metric.
For example, the upcoming MongoDB
source would set this to
mongodb
.
Sinks can then decide what to do with this prefix. For example, the
prometheus
sink would simply the metric name with it, but
aws_cloudwatch_metrics
would use it as the Namespace
field in
PutMetricData
requests.
A pipeline might look like:
[sources.my_source_id]
type = "apache_metrics"
endpoints = ["http://localhost/server-status?auto"]
namespace = "apache"
[transforms.my_transform_id]
# General
type = "lua" # required
inputs = ["my_source_id"] # required
version = "2" # required
# Hooks
hooks.process = """
function (event, emit)
if event.metric.namespace == "apache" then
-- do something
end
emit(event)
end
"""
[sinks.prometheus]
type = "prometheus"
inputs = ["my_transform_id"]
address = "0.0.0.0:9598"
namespace = ""
[sinks.cloudwatch]
type = "aws_cloudwatch_metrics"
inputs = ["my_transform_id"]
namespace = ""
region = "us-east-1"
Where the prometheus
sink would simply output metrics with name prefixed by
apache_
and aws_cloudwatch_metrics
would use it as the separate Namespace
field in AWS API calls.
Once Make the namespace
option on metrics sinks optional #3609 is done. The sinks could look
something like:
[sinks.my_sink_id]
type = "prometheus"
inputs = ["my_transform_id"]
address = "0.0.0.0:9598"
default_namespace = "unknown"
Where a namespace could be set for any metrics that do not already have one.
Currently, I don't think there is a way to tell if a metric already has a namespace to avoid setting an additional one in sinks the require it.
- telegraf.
They model it a bit differently with a
metric
having a number of fields where thename
is what we would call thenamespace
andfields
is what we would generate individual metrics for.
- Not all metric sources may have the concept of a "namespace" so we'll need to
figure out what do with it for those cases. It think prefixing it as
prometheus
does would be a reasonable default. - This namespace concept may be confusing to users if none of their sinks or sources use it
We could opt to model metrics closer to how Telegraf does it where we would encode all of the metrics for a given source as one metric with a set of fields.
I didn't closely consider this option given that the proposed option seems reasonable and is a smaller change to the data model.
- Do we want to have the
prometheus
source parse thenamespace
out of metrics it scrapes? The naming conventions suggest that all metrics should start with one word describing the domain (or namespace) followed by a_
but there is requirement that prometheus endpoints satisfy this. We could make it optional directive on the source to control parsing metric namespaces.
Incremental steps that execute this change. Generally this is in the form of:
- Submit a PR with
namespace
modeled as a first-class field
None