Skip to content

4. Java Application as a Runtime White Box: App running, JVM internals monitoring, troubleshooting, faults analysing and tuning.

Notifications You must be signed in to change notification settings

aazayka/java-application-monitoring-and-troubleshooting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Java Application Monitoring and Troubleshooting Basics

4. Java Application as a Runtime White Box: App running, JVM and application monitoring, troubleshooting, faults analysing and tuning. 24 hrs / 3 days.

Webinar recordings

Training Objectives

  • Understanding modern application architecture and defect hotspots
  • Understanding JVM classes, memory and threading architecture
  • Hands-on skill of monitoring modern applications
  • Understanding modern IO architecture and its pitfalls
  • Hands-on skill of monitoring persistent data-driven applications

Prerequisites

Hardware

  • RAM ≥ 8Гб
  • Wi-Fi with Internet access

Software at student's developer station

Network access from student stations to emulation of prod host

Network Access from student stations and prod host

  • github.org :443 :80
  • repo1.maven.org :443 :80
  • jcenter.bintray.com :443 :80
  • hub.docker.com :443 :80

Agenda

* starred items and checked checklist items are optional

Training introducing and focusing (15m)

  • Schedule
  • Trainer
  • Training overview
  • Rules

Hands-on: Teams and their demand (15m)

  • Pairs forming and introduction
  • Attendees prerequisites check
  • Topics focus demand from attendees
  • Additional topics demand form attendees

Java Platform crash course (2h)

What do any application doing?

System as Public service Metaphor

Concept Metaphor Code
Thread Worker man Thread created by runtime: java MyApplication
Data input Visitor's wishes Console user input
Data processing Meal recipes, conversation scripts, labor instructions Code as instructions
Data storing Persistent production store Files as persistent store
Data output Giving away to Visitor his meals Console output

How we do model the data?

Concept Metaphor Code
Primitive Types People can think and communicate only with numbers and strings String restaurant menu
Structures People can think with composite entities, concepts Domain class and enum
Object of structure Instance of concept, with its own state differs from other instance Dealing with particular object while processing request

How we do model the behavior?

Concept Metaphor Code
Procedure Meal recipe or conversation script Setting behavior with methods
Call stack Chain of actions workers call at others Calling method from method
Class Role: Chief or Waiter, state + bunch of procedures dealing with it Today we likely divide state and behavior to domain entities and services
Object of class Johnny the Chief and Maggy the chief differs with its state but have same behavior
Application logic Scenario how to behave all the workers in any case Workers takes responsibilities on them to rule at their level

Where data is stored? Core data scopes

Concept Metaphor Implementation
Local/method/stack variables Short-term memory: Chief remember sugar doze only when doing sugaring Call Stack
Parameters Details when asking others to do some work: waiter asks johnnyChief.makeMeal(whatMeals?) Call Stack
Object state State of worker or structure: its current properties values Heap object space
- Request scope Some object state accessible to all the workers in call chain handling request: sticky note or voice message given each worker to next, "not spicy" Parameters, framework support, ThreadLocal
- Session scope Some object state accessible to all the workers handling all requests from the same Visitor: "its for table 13" Framework support
- Singleton/application scope Some object state accessible to all the workers Framework support, Language support for static variables
Persistent Long-term data store surviving system restarts File, embedded/local database, remote filesystem, remote database
Integration Data stored and processed by external system Remote system procedure call, message queue

How do we implement application with Java

Concept Metaphor Reality
Runtime If Developer is CEO setting application logic, Runtime is your vice JVM API and system library API
Working with thread: Thread API, states, pooling We can create work force on demand to execute our instructions But we have some RAM memory and performance cost
Working with class: dynamic classloading Instructions what to do workers get just in time not ahead but worker remember it till die But we have run-time latency costs
Working with instance: create and GC We ask our vice to hire and retire workers Objects state costs us RAM memory. When object's no longer needed it purged from RAM

How do we build Java application?

  • JVM vs JRE vs JDK
  • Phisical point ov view for java application
  • Classes, packages and JARs
  • classpath x2
  • Build cycle raw
  • Build cycle with Maven

How do we run Java application?

  • JVM vs JRE vs JDK
  • Run with JVM
  • Ways for application run-time parameterization: jvm parameters, program arguments, sys/app properties
  • Key JVM parameters for memory setup

How do we monitor Java application internals?

  • JMX simple tooling demo: JVisualVM
  • JMX architecture overview

Teamwork: NFRs and metrics checklist (15m)

  • What Quality Attributes/NFRs does JVM provide for application?
  • What Quality Attributes/NFRs do we satisfy with application monitoring?
  • Start metrics checklist by tier: JVM metrics

Hands-on: Simple application local building, running and monitoring (30m)

Given

cd
git clone https://github.com/{{ STUDENT_ACCOUNT }}/java-application-monitoring-and-troubleshooting
cd java-application-monitoring-and-troubleshooting
git checkout {{ group_custom_branch }}

When

  • Project application built locally with maven
mvn clean verify [-DskipTests]
  • Project application ran locally with CLI
java \
  -Xms128m -Xmx256m \
  -cp target/dbo-1.0-SNAPSHOT.jar \
  -Dapp.property=value \
  com.acme.dbo.Presentation \
  program arguments
  • JVisualVM profiler connected to running app
$JAVA_HOME/bin/jvisualvm
  • OS-specific monitoring tool shows application process details
linux$ top [-pid jvmpid]
windows> taskmgr

Then answered and reviewed at debrief

  • What is the default encoding for I/O?
  • What is the default heap size for app running?
  • How many java threads is active within JVM?
  • How many OS threads is active within OS JVM process?
  • What is the minimal possible heap size for app running?
  • What is the difference for profiler times: Self time/Total time, CPU time?

Modern applications architecture and deployment: What tiers do we monitor? (1h)

Tier
Application Layers: UI/P, API/C, BL/S, DAL/R
Application caching
Thread Pool
JPA Caching
JPA subsystem
Connection Pools
JDBC subsystem
Framework configuration with profiles
Framework for Spring modules management
Framework for Web/SOAP/REST application expose
Framework for Application
Application Server/Servlet Container
JVM: application debug API
JVM: application profiling API
JVM: universal monitoring API
JVM: threads, IO
JVM: memory, GC
JVM: process
Container: Networking
Container: Core
Message queues
DBMS
OS: Threads
OS: Processes
Hardware: HDD/SSD
Hardware: RAM
Hardware: CPU

Tiers and components to monitor diagram

puml
@startuml
!define ICONURL https://raw.githubusercontent.com/tupadr3/plantuml-icon-font-sprites/v2.1.0/devicons
!includeurl ICONURL/coda.puml
!define SPRITESURL https://raw.githubusercontent.com/rabelenda/cicon-plantuml-sprites/v1.0/sprites
!includeurl SPRITESURL/server.puml
!includeurl SPRITESURL/linux.puml
!includeurl SPRITESURL/docker.puml
!includeurl SPRITESURL/java.puml
!includeurl SPRITESURL/tomcat.puml
!includeurl SPRITESURL/cog.puml


component "<$server>\nhardware" as hardware #lightgray {
    [CPU]
    [RAM]
    [HDD]
    [LAN]

    component "<$linux>\nOS" as os #white {
        [container support] 
        [process management]
        [thread management]
        [filesystem i/o]
        [network i/o]

        component "<$docker>\ncontainer" as container #lightgray {
            [network virtualization]
            [port mapping]
            [overlay fs]
            database "disk image"
            
            component "<$java>\njvm process" as jvm #white {
                [class loading]
                [memory management + GC]
                [thread management]
                [filesystem i/o api]
                [network i/o api]
                [monitoring API]
                [profiling API]
                [dubug API]

                component "<$tomcat>\nservlet container" as web_container #lightgray {
                    [tcp connection \n management]
                    [http protocol \n handling]
                    [web application \n lifecycle]
                    [java components \n lifecycle]
                    [thread pools \n management]

                    component "jdbc connection pool" as container_cp {
                        [jdbc driver]
                    }

                    component "<$coda>\nframework modules management system" as spring_boot #white {
                        [framework modules \n management]
                        [application \n configuration context \n management]

                        component "<$coda>\napplication framework" as spring_core #lightgray {
                            [application configuration \n handling]
                            [application configuration \n profiles support]
                            [application components \n management]
                            [common scopes \n management]
                            [user-defined thread pools \n management]
                            [logging \n management]

                            component "jpa persistent provider" #white {
                                [db data caching \n management]
                                component "jdbc connection pool" as app_cp {
                                    [jdbc driver]
                                }
                            }

                            component "<$coda>\nweb/soap/rest framework" as spring_mvc #white {
                                [http protocol \n API]
                                [request routing]
                                [http scopes \n management]
                                [monitoring \n endpoint]

                                component "<$cog>\napplication" as app #lightgray {
                                    [app data \n caching management] #lightgray 
                                    
                                    package "data access \n layer" as dal #white {
                                        [repository]
                                    }
                                    package "business logic \n layer" as bl #white {
                                        [service]
                                    }
                                    package "api \n layer" as cl #white {
                                        [controller]
                                    }
                                    package "presentation \n layer" as pl #white {
                                        [view]
                                    }

                                    service -> repository 
                                    controller -> service
                                    view -> controller
                                }
                            }
                        }
                    }
                }     
            }
        } 
    }
}
@enduml

Teamwork: What metrics do we monitor for production app? (30m)

Monitoring architecture overview (30m)

Inrastructure overview

pUML source
@startuml
node "dev station" as devstation {
 [ssh terminal] as terminal
 [ansible playbook] as ansible
 [browser]
 [jmeter]

 ansible -> terminal
}

actor Ops as ops
ops --> ansible
ops --> terminal
ops --> browser
ops --> jmeter

node prod {
 [jmeter agent] as jmeter_agent
 [node exporter] as node_exporter

 component [application] {
  [monitoring endpoint] as monitor
 }

 component [prometheus] {
  database metrics_history
 }

 prometheus --> monitor
 prometheus -> node_exporter

 jmeter_agent -> application
 node_exporter -> prod

 interface port
 monitor -( port
}

terminal --> prod
browser --> prometheus
browser --> application
jmeter --> jmeter_agent
@enduml

Monitoring overview and tools

Load generation architecture overview

  • Types of performance testing except stress testing?
  • While monitoring: What type should we use? What performance metrics do we test?
  • Testing vs Monitoring

Hands-on: Prod host and monitoring provisioning (15m)

Given

cd iaac

When

  • Steps executed according Provisioning documentation

Then

  • Prometheus UI up and running at http://{{ prod }}:9090/alerts
  • JMeter can connect agent deployed at {{ prod }}:
jmeter -Jremote_hosts=127.0.0.1 -Dserver.rmi.ssl.disable=true

JMeter → Options → Log Viewer
JMeter → Run → Remote Start → 127.0.0.1

Modern applications architecture and deployment: How do we monitor tiers? (1h)

Tier Implementation Tools
Application Layers PWA or Server-side Template Engine, Spring @Controllers, @Services, Spring Data JPA @Repositories Spring Metrics for Counters, Timers, Long Task Timers, Statistics
Application caching spring-boot-starter-cache module + built-in default Simple cache provider Spring Metrics for Caches
Thread Pool Java built-in ExecutorService Spring Metrics for DataSources
JPA subsystem and JPA Caching Hibernate service:jmx:// Hibernate built-in statistics
JDBC subsystem and Connection Pools Derby JDBC driver + HikariCP service:jmx://com.zaxxer.hikari, Spring Metrics for DataSources
Framework for modules management Spring Boot spring-boot-actuator + Built-in Micrometer + Prometheus Adapter
Framework for Application Spring Core + Spring MVC (spring-boot-starter-web) Spring Metrics for Web Instrumentation [for Prometheus], Core Micrometer [for Prometheus]
Application Server/Servlet Container spring-boot-starter-tomcat
JVM: application debug API JPDA jsadebugd
JVM: application profiling API JVMTI hprof
JVM: threads, IO JVM scheduler, JNI jstack
JVM: memory, GC Built-in Garbage Collectors jstat, jstatd, jmap, jhat
JVM: universal monitoring API JMX jvisualvm
JVM: process Oracle/OpenJDK JRE jps, jcmd, jinfo
Containers Docker docker cli, docker api for Prometheus, Prometheus cAdvisor
Message queues n/u vendor tools, prometheus exporters
DBMS Apache Derby / Postgresql vendor tools, Prometheus pg_exporter, pg explain, pg analyse
OS Linux ps, top
Hardware x86 df, free, SNMP, Prometheus Node Exporter

Hands-on: Modern application remote building, running and monitoring (30m)

Given

  • Given rights for application folder to developer user
  • ssh session to {{ prod }}:ansible_port
ssh -p {{ ansible_port }} {{ ansible_user }}@{{ prod }}
cd /opt
git clone --branch master --depth 1 https://github.com/{{ STUDENT_ACCOUNT }}/agile-practices-application
cd agile-practices-application
mvn clean verify [-DskipTests]

When

  • Application ran at {{ prod }}
cd /opt/agile-practices-application
rm -rf dbo-db
nohup \
  java \
    -Xms128m -Xmx128m \
    -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=heapdump.hprof \
    -Dderby.stream.error.file=log/derby.log \
    -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.rmi.port=9999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 \
    -jar target/dbo-1.0-SNAPSHOT.jar \
      --spring.profiles.active=qa \
      --server.port=8080 \
> /dev/null 2>&1 &
  • Load emulation ran at dev station
jmeter -n -t load.jmx -Jremote_hosts=127.0.0.1 -Dserver.rmi.ssl.disable=true
  • CLI tools used at {{ prod }}
df -ah
free -m

docker images -a
docker ps -a

ps -ef
ps -eaux --forest
ps -eT | grep <pid>

top
top + 'f'
top -p <pid>
top -H -p <pid>

jps [-lvm]
jcmd <pid> help
jcmd <pid> VM.uptime
jcmd <pid> VM.system_properties
jcmd <pid> VM.flags
  • Web applications used from dev station
http://{{ prod }}:8080/dbo/swagger-ui.html

http://{{ prod }}:8080/dbo/actuator/health
http://{{ prod }}:8080/dbo/actuator
http://{{ prod }}:8080/dbo/actuator/prometheus

http://{{ prod }}:9090/alerts
http://{{ prod }}:9090/graph
http://{{ prod }}:9090/graph?g0.range_input=15m&g0.tab=0&g0.expr=http_server_requests_seconds_count

Finally

  • JMeter load emulation stopped at dev station
  • Application gracefully stopped at {{ prod }}
curl --request POST http://{{ prod }}:8080/dbo/actuator/shutdown
rm -rf dbo-db

Then answered and reviewed at debrief

  • Free HDD space? Free RAM?
  • How many JVMs running?
  • What DBMS used for application?
  • What JVM version used for application? What are the parameters, properties and arguments used?
  • How many Docker containers are running? What images used?
  • What are the health indicator for application?
  • What is the application uptime?
  • What is the CPU usage for application?
  • How many http requests servlet container handled by different URLs?
  • How many http sessions are active?
  • What is the current system load average?

Typical JVM memory issues (3)

JVM memory architecture

  • On-heap and off-heap architectures
  • GC algorithms
  • Memory structures for typical GCs

Heap dumps and key memory metrics

  • Creating
  • Analysing

Demo

  • Memory parameters tuning
  • Analyse metrics with Prometheus
  • Heap dump analysing

Hands-on

  • Add new metrics to checklist by tier: JVM
  • Given workload
  • Analyse metrics with Prometheus
  • Analyse remote heap dump
  • Make issue hypothesis report and resolving plan

Typical issues and resolution

  • Leaks
  • OOME for different generations

GC issues

  • stop-the-world problem
  • GC trade-off for latency and thoughput

Demo

  • GC statistics monitoring

Teamwork

  • New metrics to checklist by tier: JVM

Hands-on

  • Given workload tool and test plan
  • Analyse GC settings
  • Analyse GC statistics with Prometheus
  • Make issue hypothesis report and resolving plan
jcmd <pid> GC.heap_dump /tmp/dump.hprof
jmap -dump:live,format=b,file=/tmp/dump.hprof <pid>

Typical JVM threading issues (3)

JVM threading architecture

  • Threads
  • Sheduler and preemtive concurrency
  • Sheduling overhead
  • Green and native threads
  • Thread states
  • Types of blocking/waiting

Demo

  • Making thread dump and analysing with IDE
  • Making thread dump and analysing with Profiler
  • Monitoring threads online with local JMX Profiler
  • Analyse thread statistics with Prometheus

Application threading architecture

  • Thread pooling patterns
  • Threading patterns for connections
  • Threading patterns for logic processing
  • Data access concurrency architectures
  • Cooperative concurrency application arhitecture

Typical issues and resolution

  • Paralllism issues and patterns
  • Concurrency issues and patterns

Teamwork

  • New metrics to checklist by tier: JVM

Hands-on

  • Given workload
  • Analyse thread statistics with Prometheus
  • Make issue hypothesis report and resolving plan

Typical JVM IO issues (3)

Blocking IO architecture

  • Syncronous IO concept
  • Building blocks

Demo

  • Sync remote call implementation

Typical issues

  • Encoding
  • Buffering
  • Blocking for user data
  • Excessive IO classes objects allocation
  • Closing resources
  • Pooling resources

Hands-on

  • Given workload
  • Analyse IO operations with Prometheus and logs
  • Make issue hypothesis report and resolving plan

Non-blocking IO architecture

  • Asyncronous IO concept
  • NIO building blocks

Demo

  • Async remote call implementation

Typical issues

  • Code complexity
  • Error handling
  • Response time

Teamwork

  • New metrics to checklist by tier: JVM, OS

Hands-on

  • Given workload
  • Analyse IO operations with Prometheus and logs
  • Make issue hypothesis report and resolving plan

Typical data storage issues (3)

JDBC architecture

  • JDBC API
  • Driver types
  • Prefetching tuning
  • Prepared statements
  • Batch operattions
  • Transactions
  • Isolation levels
  • Connection pools

Demo

  • Database CRUD implementaion with low-level JDBC API
  • Database CRUD implementaion with Spring JDBC Template

Hands-on

  • Given workload
  • Analyse DB operations
  • Make issue hypothesis report and resolving plan

JPA architecture

  • JPA API
  • Caching levels
  • JPA transactions architecture

Spring JPA architecture

  • Spring Data JPA
  • Transaction management

Demo

  • Database CRUD implementaion with Spring Data JPA

Teamwork

  • New metrics to checklist by tier: JPA, JVM

Hands-on

  • Given workload
  • Analyse DB operations
  • Make issue hypothesis report and resolving plan

Final retro (0.5)

  • Value taken
  • Process Improvement Actions
  • Training Improvement Actions

Typical JVM containerization issues (1)*

Containerization architecture

  • Docker overview
  • Docker containers
  • Docker images
  • Image provisioning and repositories

Demo

  • Application containerization
  • Configurating container and resource limits
  • Running and monitoring container

Containerization issues

  • Memory issues and patterns
  • Disk I/O issues and patterns

Teamwork

  • New metrics to checklist by tier: JVM, OS

Hands-on

  • Given workload
  • Modify container configuration with K8s сonfig
  • Analyse system metrics with Prometheus
  • Make issue hypothesis report and resolving plan

Typical caching issues (1.5)*

Caching concept

  • Why caches?
  • Caching architecture: levels

Demo

  • Caching proxy
  • Java Cache API
  • Spring application caching
  • JPA cache levels
  • DB caches

Typical issues

  • Cold start
  • Hit statistics
  • Cache resetting and inconsistency

Teamwork

  • New metrics to checklist by tier: caches

Hands-on

  • Given workload
  • Analyse application caches configuration
  • Analyse caches statisitcs
  • Make issue hypothesis report and resolving plan

Generating application workload (1.5)*

Load test design

  • Black-box approach
  • Load test structure
  • Load tests suite
  • Metrics to analyse

Load test types

  • Load
  • Stress
  • Spike
  • Redundancy

Demo with JMeter tool

  • Agent architecture
  • Test plan
  • Configuring reports
  • Running workload
  • Report analysis

Hands-on

  • Congiuring workload plan
  • Running workload
  • Analysing reports
  • Issue hypothesis

Distributed logging (1.5)*

Intro to Java logging solutions

  • Java logging libraries hell
  • SLF4J and Logback overview
  • Logging architecture
  • Application configuration

Hands-on

  • Configuring application local logging

Distributed logging collection and processing

  • Distributed logging collection architecture with ELK stack
  • Application configuration
  • Searching and analysing logs

Demo

  • Configuring application distributed logging

Hands-on

  • Given configuration
  • Load tests run
  • Analysing logs with ELK

System monitoring (1.5)*

Distributed monitoring arhitecture

  • Prometheus architecture overview
  • Metrics sources and agents
  • Analysing monitoring dashboards and alerts

Teamwork

  • New metrics checklist by tier: system and OS

Demo

  • Configuring hardware metrics dasboard and alerts with Prometheus

Hands-on

  • Given configuration
  • Load tests run
  • Analysing metrics and alerts with Prometheus

Teamwork

  • New metrics to checklist by tier: JVM

Demo

  • Configuring JVM through JMX metrics with Prometheus

Hands-on

  • Given configuration
  • Load tests run
  • Analysing metrics and alerts with Prometheus
  • Make issue hypothesis report and resolving plan

Typical RDBMS issues (1.5)*

DB architecture

  • DB request processing
  • DB execution plan
  • Constraints
  • Indexes
  • Transactions implementation architectures
  • "Vacuum" side effects

Demo

  • Profiling DB request with explain

Teamwork

  • New metrics to checklist by tier: DBMS, OS

Hands-on

  • Given workload
  • Analyse DB schema
  • Analyse requests profiles
  • Make issue hypothesis report and resolving plan

Typical CI/CD pipeline overview (1.5)*

How to deal with typical distributed system issues? (2.5)*

First Law of Distributed Objects

  • Distributed systems: Seasons in the abbys
  • CAP thesis
  • Data storage architectures overview in CAP terms

Microservices architecture patterns and trade-offs

  • Why microservices?
  • Data encapsulation
  • Gateway
  • Services discovery
  • Data duplication
  • Distributed transactions

Monitoring and tracing

  • Monitoring patterns
  • Tracing patterns

Demo

  • Business operation in microservices environment tracing

Hands-on

  • Given workload
  • Analyse distributed architecture
  • Analyse call trace
  • Play "hell monkey"
  • Make issue hypothesis report and resolving plan

About

4. Java Application as a Runtime White Box: App running, JVM internals monitoring, troubleshooting, faults analysing and tuning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published