Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for other unhealthy VM states #2597

Merged

Conversation

anshrupani
Copy link
Contributor

What is this change about?

As of now, bosh does not collect metrics regarding VMs which are unhealthy (basically any state that is not running). With this PR, we now introduce a new metric: bosh_unhealthy_instances, which basically reports the same.

What tests have you run against this PR?

We tested a bosh dev release on our development environments for the new metric, and it worked fine. We also ran the bosh-monitor unit tests, and added more as needed.

How should this change be described in bosh release notes?

Bosh now emits a new metric: "bosh_unhealthy_instances", which can be used to track unhealthy instances per deployment over time.

Does this PR introduce a breaking change?

No

Tag your pair, your PM, and/or team!

@DennisAhaus

@anshrupani anshrupani changed the title Add metrics for other vm states Add metrics for other unhealthy VM states Jan 22, 2025
src/bosh-monitor/lib/bosh/monitor/director.rb Outdated Show resolved Hide resolved
src/bosh-monitor/spec/unit/bosh/monitor/director_spec.rb Outdated Show resolved Hide resolved
src/bosh-monitor/lib/bosh/monitor/director.rb Outdated Show resolved Hide resolved
src/bosh-monitor/lib/bosh/monitor/director.rb Outdated Show resolved Hide resolved
src/bosh-monitor/spec/unit/bosh/monitor/director_spec.rb Outdated Show resolved Hide resolved
src/bosh-monitor/lib/bosh/monitor/director.rb Outdated Show resolved Hide resolved
@jpalermo jpalermo requested review from a team, aramprice and lnguyen and removed request for a team January 23, 2025 15:53
Copy link
Member

@xtreme-nitin-ravindran xtreme-nitin-ravindran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jpalermo jpalermo merged commit e1cdc79 into cloudfoundry:main Jan 30, 2025
9 checks passed
@aramprice
Copy link
Member

aramprice commented Jan 31, 2025

Looks like this is breaking integration specs in CI starting at

Failures:

  1) using director with config server when config server certificates are trusted when deployment manifest has variables when all variables are set in config server when health monitor is around and resurrector is enabled interpolates values correctly when resurrector kicks in
     Failure/Error: if vm.vm_cid == resurrected_vm.vm_cid
     
     NoMethodError:
       undefined method `vm_cid' for nil
     # ./spec/integration_support/director.rb:170:in `kill_vm_and_wait_for_resurrection'
     # ./spec/integration/config_server/config_server_spec.rb:401:in `block (6 levels) in <top (required)>'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example.rb:263:in `instance_exec'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example.rb:263:in `block in run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example.rb:511:in `block in with_around_and_singleton_context_hooks'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example.rb:468:in `block in with_around_example_hooks'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/hooks.rb:486:in `block in run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/hooks.rb:624:in `run_around_example_hooks_for'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/hooks.rb:486:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example.rb:468:in `with_around_example_hooks'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example.rb:511:in `with_around_and_singleton_context_hooks'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example.rb:259:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:646:in `block in run_examples'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:642:in `map'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:642:in `run_examples'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:607:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `block in run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `map'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `block in run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `map'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `block in run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `map'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `block in run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `map'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/example_group.rb:608:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:121:in `block (3 levels) in run_specs'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:121:in `map'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:121:in `block (2 levels) in run_specs'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/configuration.rb:2097:in `with_suite_hooks'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:116:in `block in run_specs'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/reporter.rb:74:in `report'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:115:in `run_specs'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:89:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:71:in `run'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/lib/rspec/core/runner.rb:45:in `invoke'
     # /usr/local/bundle/ruby/3.3.0/gems/rspec-core-3.13.2/exe/rspec:4:in `<top (required)>'
     # /usr/local/bundle/ruby/3.3.0/bin/rspec:25:in `load'
     # /usr/local/bundle/ruby/3.3.0/bin/rspec:25:in `<top (required)>'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/cli/exec.rb:59:in `load'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/cli/exec.rb:59:in `kernel_load'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/cli/exec.rb:23:in `run'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/cli.rb:452:in `exec'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/vendor/thor/lib/thor.rb:538:in `dispatch'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/cli.rb:35:in `dispatch'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/cli.rb:29:in `start'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/exe/bundle:28:in `block in <top (required)>'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/lib/bundler/friendly_errors.rb:117:in `with_friendly_errors'
     # /usr/local/lib/ruby/gems/3.3.0/gems/bundler-2.6.3/exe/bundle:20:in `<top (required)>'
     # /usr/local/bundle/bin/bundle:108:in `load'
     # /usr/local/bundle/bin/bundle:108:in `<main>'

Edit: trimmed CI logs, see more relevant data in #2598

Comment on lines +99 to +103
if !request_path.nil?
request_path = request_path.sub(endpoint, '')
end
parsed_endpoint = URI.parse(endpoint + (request_path || ''))
headers = options['headers'] || {}
Copy link
Member

@aramprice aramprice Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is causing invalid requests, from integration spec logs:

==> sandbox/logs/health_monitor.out-7b5e96eb <==
E, [2025-02-01T02:17:48.089550 #17476] ERROR : Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
E, [2025-02-01T02:17:48.089706 #17476] ERROR : /tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:115:in `rescue in perform_request'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:114:in `perform_request'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:48:in `block in get_deployment_instances_full'
<internal:kernel>:187:in `loop'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/director.rb:46:in `get_deployment_instances_full'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/instance_manager.rb:37:in `block in fetch_deployments'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/instance_manager.rb:30:in `each'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/instance_manager.rb:30:in `fetch_deployments'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/runner.rb:182:in `fetch_deployments'
/tmp/build/e55deab7/bosh/src/bosh-monitor/lib/bosh/monitor/runner.rb:131:in `block in poll_director'
/usr/local/bundle/ruby/3.3.0/gems/async-2.21.3/lib/async/task.rb:200:in `block in run'
/usr/local/bundle/ruby/3.3.0/gems/async-2.21.3/lib/async/task.rb:438:in `block in schedule'
I, [2025-02-01T02:17:48.090186 #17476]  INFO : [ALERT] Alert @ 2025-02-01 02:17:48 UTC, severity 3: Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
D, [2025-02-01T02:17:48.090430 #17476] DEBUG : (Resurrector) ignoring event of category '': Alert @ 2025-02-01 02:17:48 UTC, severity 3: Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
I, [2025-02-01T02:17:48.090784 #17476]  INFO : (Event logger) notifying director about event: Alert @ 2025-02-01 02:17:48 UTC, severity 3: Invalid URI: https://127.0.0.1:61002https://127.0.0.1/tasks/151
D, [2025-02-01T02:17:48.090854 #17476] DEBUG : sending HTTP POST to: https://127.0.0.1:61002/events

@anshrupani it would be good if you could look at this and try to see why this change is causing the invalid (ex: https://127.0.0.1:61002https://127.0.0.1/tasks/151) requests to be generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

8 participants