-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for other unhealthy VM states #2597
Add metrics for other unhealthy VM states #2597
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Looks like this is breaking integration specs in CI starting at
Edit: trimmed CI logs, see more relevant data in #2598 |
if !request_path.nil? | ||
request_path = request_path.sub(endpoint, '') | ||
end | ||
parsed_endpoint = URI.parse(endpoint + (request_path || '')) | ||
headers = options['headers'] || {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is causing invalid requests, from integration spec logs:
==> sandbox/logs/health_monitor.out-7b5e96eb <==
E, [2025-02-01T02:17:48.089550 #17476] ERROR : Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
E, [2025-02-01T02:17:48.089706 #17476] ERROR : /tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:115:in `rescue in perform_request'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:114:in `perform_request'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:48:in `block in get_deployment_instances_full'
<internal:kernel>:187:in `loop'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:46:in `get_deployment_instances_full'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/instance_manager.rb:37:in `block in fetch_deployments'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/instance_manager.rb:30:in `each'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/instance_manager.rb:30:in `fetch_deployments'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/runner.rb:182:in `fetch_deployments'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/runner.rb:131:in `block in poll_director'
/usr/local/bundle/ruby/3.3.0/gems/async-2.21.3/lib/async/task.rb:200:in `block in run'
/usr/local/bundle/ruby/3.3.0/gems/async-2.21.3/lib/async/task.rb:438:in `block in schedule'
I, [2025-02-01T02:17:48.090186 #17476] INFO : [ALERT] Alert @ 2025-02-01 02:17:48 UTC, severity 3: Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
D, [2025-02-01T02:17:48.090430 #17476] DEBUG : (Resurrector) ignoring event of category '': Alert @ 2025-02-01 02:17:48 UTC, severity 3: Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
I, [2025-02-01T02:17:48.090784 #17476] INFO : (Event logger) notifying director about event: Alert @ 2025-02-01 02:17:48 UTC, severity 3: Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
D, [2025-02-01T02:17:48.090854 #17476] DEBUG : sending HTTP POST to: https://127.0.0.1:61002/events
@anshrupani it would be good if you could look at this and try to see why this change is causing the invalid (ex: https://127.0.0.1:61002https://127.0.0.1/tasks/151
) requests to be generated.
What is this change about?
As of now, bosh does not collect metrics regarding VMs which are unhealthy (basically any state that is not running). With this PR, we now introduce a new metric: bosh_unhealthy_instances, which basically reports the same.
What tests have you run against this PR?
We tested a bosh dev release on our development environments for the new metric, and it worked fine. We also ran the bosh-monitor unit tests, and added more as needed.
How should this change be described in bosh release notes?
Bosh now emits a new metric: "bosh_unhealthy_instances", which can be used to track unhealthy instances per deployment over time.
Does this PR introduce a breaking change?
No
Tag your pair, your PM, and/or team!
@DennisAhaus