4. Java Application as a Runtime White Box: App running, JVM and application monitoring, troubleshooting, faults analysing and tuning. 24 hrs / 3 days.
- Understanding modern application architecture and defect hotspots
- Understanding JVM classes, memory and threading architecture
- Hands-on skill of monitoring modern applications
- Understanding modern IO architecture and its pitfalls
- Hands-on skill of monitoring persistent data-driven applications
- RAM ≥ 8Гб
- Wi-Fi with Internet access
- prod accessible
- Ports at {{ prod }}:ports_needed accessible
- github.org :443 :80
- repo1.maven.org :443 :80
- jcenter.bintray.com :443 :80
- hub.docker.com :443 :80
* starred items and checked checklist items are optional
- Schedule
- Trainer
- Training overview
- Rules
- Pairs forming and introduction
- Attendees prerequisites check
- Topics focus demand from attendees
- Additional topics demand form attendees
System as Public service Metaphor
Concept | Metaphor | Code |
---|---|---|
Thread | Worker man | Thread created by runtime: java MyApplication |
Data input | Visitor's wishes | Console user input |
Data processing | Meal recipes, conversation scripts, labor instructions | Code as instructions |
Data storing | Persistent production store | Files as persistent store |
Data output | Giving away to Visitor his meals | Console output |
Concept | Metaphor | Code |
---|---|---|
Primitive Types | People can think and communicate only with numbers and strings | String restaurant menu |
Structures | People can think with composite entities, concepts | Domain class and enum |
Object of structure | Instance of concept, with its own state differs from other instance | Dealing with particular object while processing request |
Concept | Metaphor | Code |
---|---|---|
Procedure | Meal recipe or conversation script | Setting behavior with methods |
Call stack | Chain of actions workers call at others | Calling method from method |
Class | Role: Chief or Waiter, state + bunch of procedures dealing with it | Today we likely divide state and behavior to domain entities and services |
Object of class | Johnny the Chief and Maggy the chief differs with its state but have same behavior | |
Application logic | Scenario how to behave all the workers in any case | Workers takes responsibilities on them to rule at their level |
Concept | Metaphor | Implementation |
---|---|---|
Local/method/stack variables | Short-term memory: Chief remember sugar doze only when doing sugaring | Call Stack |
Parameters | Details when asking others to do some work: waiter asks johnnyChief.makeMeal(whatMeals?) | Call Stack |
Object state | State of worker or structure: its current properties values | Heap object space |
- Request scope | Some object state accessible to all the workers in call chain handling request: sticky note or voice message given each worker to next, "not spicy" | Parameters, framework support, ThreadLocal |
- Session scope | Some object state accessible to all the workers handling all requests from the same Visitor: "its for table 13" | Framework support |
- Singleton/application scope | Some object state accessible to all the workers | Framework support, Language support for static variables |
Persistent | Long-term data store surviving system restarts | File, embedded/local database, remote filesystem, remote database |
Integration | Data stored and processed by external system | Remote system procedure call, message queue |
Concept | Metaphor | Reality |
---|---|---|
Runtime | If Developer is CEO setting application logic, Runtime is your vice | JVM API and system library API |
Working with thread: Thread API, states, pooling | We can create work force on demand to execute our instructions | But we have some RAM memory and performance cost |
Working with class: dynamic classloading | Instructions what to do workers get just in time not ahead but worker remember it till die | But we have run-time latency costs |
Working with instance: create and GC | We ask our vice to hire and retire workers | Objects state costs us RAM memory. When object's no longer needed it purged from RAM |
- JVM vs JRE vs JDK
- Phisical point ov view for java application
- Classes, packages and JARs
- classpath x2
- Build cycle raw
- Build cycle with Maven
- JVM vs JRE vs JDK
- Run with JVM
- Ways for application run-time parameterization: jvm parameters, program arguments, sys/app properties
- Key JVM parameters for memory setup
- JMX simple tooling demo: JVisualVM
- JMX architecture overview
- What Quality Attributes/NFRs does JVM provide for application?
- What Quality Attributes/NFRs do we satisfy with application monitoring?
- Start metrics checklist by tier: JVM metrics
- Satisfied prerequisites
- Forked simple project codebase
- Cloned fork locally
cd
git clone https://github.com/{{ STUDENT_ACCOUNT }}/java-application-monitoring-and-troubleshooting
cd java-application-monitoring-and-troubleshooting
git checkout {{ group_custom_branch }}
- Project application built locally with maven
mvn clean verify [-DskipTests]
- Project application ran locally with CLI
java \
-Xms128m -Xmx256m \
-cp target/dbo-1.0-SNAPSHOT.jar \
-Dapp.property=value \
com.acme.dbo.Presentation \
program arguments
- JVisualVM profiler connected to running app
$JAVA_HOME/bin/jvisualvm
- OS-specific monitoring tool shows application process details
linux$ top [-pid jvmpid]
windows> taskmgr
- What is the default encoding for I/O?
- What is the default heap size for app running?
- How many java threads is active within JVM?
- How many OS threads is active within OS JVM process?
- What is the minimal possible heap size for app running?
- What is the difference for profiler times: Self time/Total time, CPU time?
Tier |
---|
Application Layers: UI/P, API/C, BL/S, DAL/R |
Application caching |
Thread Pool |
JPA Caching |
JPA subsystem |
Connection Pools |
JDBC subsystem |
Framework configuration with profiles |
Framework for Spring modules management |
Framework for Web/SOAP/REST application expose |
Framework for Application |
Application Server/Servlet Container |
JVM: application debug API |
JVM: application profiling API |
JVM: universal monitoring API |
JVM: threads, IO |
JVM: memory, GC |
JVM: process |
Container: Networking |
Container: Core |
Message queues |
DBMS |
OS: Threads |
OS: Processes |
Hardware: HDD/SSD |
Hardware: RAM |
Hardware: CPU |
Tiers and components to monitor diagram
puml
@startuml
!define ICONURL https://raw.githubusercontent.com/tupadr3/plantuml-icon-font-sprites/v2.1.0/devicons
!includeurl ICONURL/coda.puml
!define SPRITESURL https://raw.githubusercontent.com/rabelenda/cicon-plantuml-sprites/v1.0/sprites
!includeurl SPRITESURL/server.puml
!includeurl SPRITESURL/linux.puml
!includeurl SPRITESURL/docker.puml
!includeurl SPRITESURL/java.puml
!includeurl SPRITESURL/tomcat.puml
!includeurl SPRITESURL/cog.puml
component "<$server>\nhardware" as hardware #lightgray {
[CPU]
[RAM]
[HDD]
[LAN]
component "<$linux>\nOS" as os #white {
[container support]
[process management]
[thread management]
[filesystem i/o]
[network i/o]
component "<$docker>\ncontainer" as container #lightgray {
[network virtualization]
[port mapping]
[overlay fs]
database "disk image"
component "<$java>\njvm process" as jvm #white {
[class loading]
[memory management + GC]
[thread management]
[filesystem i/o api]
[network i/o api]
[monitoring API]
[profiling API]
[dubug API]
component "<$tomcat>\nservlet container" as web_container #lightgray {
[tcp connection \n management]
[http protocol \n handling]
[web application \n lifecycle]
[java components \n lifecycle]
[thread pools \n management]
component "jdbc connection pool" as container_cp {
[jdbc driver]
}
component "<$coda>\nframework modules management system" as spring_boot #white {
[framework modules \n management]
[application \n configuration context \n management]
component "<$coda>\napplication framework" as spring_core #lightgray {
[application configuration \n handling]
[application configuration \n profiles support]
[application components \n management]
[common scopes \n management]
[user-defined thread pools \n management]
[logging \n management]
component "jpa persistent provider" #white {
[db data caching \n management]
component "jdbc connection pool" as app_cp {
[jdbc driver]
}
}
component "<$coda>\nweb/soap/rest framework" as spring_mvc #white {
[http protocol \n API]
[request routing]
[http scopes \n management]
[monitoring \n endpoint]
component "<$cog>\napplication" as app #lightgray {
[app data \n caching management] #lightgray
package "data access \n layer" as dal #white {
[repository]
}
package "business logic \n layer" as bl #white {
[service]
}
package "api \n layer" as cl #white {
[controller]
}
package "presentation \n layer" as pl #white {
[view]
}
service -> repository
controller -> service
view -> controller
}
}
}
}
}
}
}
}
}
@enduml
- Add metrics to checklist by tiers
pUML source
@startuml
node "dev station" as devstation {
[ssh terminal] as terminal
[ansible playbook] as ansible
[browser]
[jmeter]
ansible -> terminal
}
actor Ops as ops
ops --> ansible
ops --> terminal
ops --> browser
ops --> jmeter
node prod {
[jmeter agent] as jmeter_agent
[node exporter] as node_exporter
component [application] {
[monitoring endpoint] as monitor
}
component [prometheus] {
database metrics_history
}
prometheus --> monitor
prometheus -> node_exporter
jmeter_agent -> application
node_exporter -> prod
interface port
monitor -( port
}
terminal --> prod
browser --> prometheus
browser --> application
jmeter --> jmeter_agent
@enduml
- Types of performance testing except stress testing?
- While monitoring: What type should we use? What performance metrics do we test?
- Testing vs Monitoring
- Ansible provisioning scripts and assets
cd iaac
- Provisioning documentation
- Steps executed according Provisioning documentation
- Prometheus UI up and running at
http://{{ prod }}:9090/alerts
- JMeter can connect agent deployed at {{ prod }}:
jmeter -Jremote_hosts=127.0.0.1 -Dserver.rmi.ssl.disable=true
JMeter → Options → Log Viewer
JMeter → Run → Remote Start → 127.0.0.1
Tier | Implementation | Tools |
---|---|---|
Application Layers | PWA or Server-side Template Engine, Spring @Controllers, @Services, Spring Data JPA @Repositories | Spring Metrics for Counters, Timers, Long Task Timers, Statistics |
Application caching | spring-boot-starter-cache module + built-in default Simple cache provider | Spring Metrics for Caches |
Thread Pool | Java built-in ExecutorService | Spring Metrics for DataSources |
JPA subsystem and JPA Caching | Hibernate | service:jmx:// Hibernate built-in statistics |
JDBC subsystem and Connection Pools | Derby JDBC driver + HikariCP | service:jmx://com.zaxxer.hikari, Spring Metrics for DataSources |
Framework for modules management | Spring Boot | spring-boot-actuator + Built-in Micrometer + Prometheus Adapter |
Framework for Application | Spring Core + Spring MVC (spring-boot-starter-web) | Spring Metrics for Web Instrumentation [for Prometheus], Core Micrometer [for Prometheus] |
Application Server/Servlet Container | spring-boot-starter-tomcat | |
JVM: application debug API | JPDA | jsadebugd |
JVM: application profiling API | JVMTI | hprof |
JVM: threads, IO | JVM scheduler, JNI | jstack |
JVM: memory, GC | Built-in Garbage Collectors | jstat, jstatd, jmap, jhat |
JVM: universal monitoring API | JMX | jvisualvm |
JVM: process | Oracle/OpenJDK JRE | jps, jcmd, jinfo |
Containers | Docker | docker cli, docker api for Prometheus, Prometheus cAdvisor |
Message queues | n/u | vendor tools, prometheus exporters |
DBMS | Apache Derby / Postgresql | vendor tools, Prometheus pg_exporter, pg explain, pg analyse |
OS | Linux | ps, top |
Hardware | x86 | df , free , SNMP, Prometheus Node Exporter |
- Given rights for application folder to developer user
-
ssh
session to {{ prod }}:ansible_port
ssh -p {{ ansible_port }} {{ ansible_user }}@{{ prod }}
- Forked application codebase to student's account
- Application built at {{ prod }}
cd /opt
git clone --branch master --depth 1 https://github.com/{{ STUDENT_ACCOUNT }}/agile-practices-application
cd agile-practices-application
mvn clean verify [-DskipTests]
- Application ran at {{ prod }}
cd /opt/agile-practices-application
rm -rf dbo-db
nohup \
java \
-Xms128m -Xmx128m \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=heapdump.hprof \
-Dderby.stream.error.file=log/derby.log \
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.rmi.port=9999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 \
-jar target/dbo-1.0-SNAPSHOT.jar \
--spring.profiles.active=qa \
--server.port=8080 \
> /dev/null 2>&1 &
- Load emulation ran at dev station
jmeter -n -t load.jmx -Jremote_hosts=127.0.0.1 -Dserver.rmi.ssl.disable=true
- CLI tools used at {{ prod }}
df -ah
free -m
docker images -a
docker ps -a
ps -ef
ps -eaux --forest
ps -eT | grep <pid>
top
top + 'f'
top -p <pid>
top -H -p <pid>
jps [-lvm]
jcmd <pid> help
jcmd <pid> VM.uptime
jcmd <pid> VM.system_properties
jcmd <pid> VM.flags
- Web applications used from dev station
http://{{ prod }}:8080/dbo/swagger-ui.html
http://{{ prod }}:8080/dbo/actuator/health
http://{{ prod }}:8080/dbo/actuator
http://{{ prod }}:8080/dbo/actuator/prometheus
http://{{ prod }}:9090/alerts
http://{{ prod }}:9090/graph
http://{{ prod }}:9090/graph?g0.range_input=15m&g0.tab=0&g0.expr=http_server_requests_seconds_count
- JMeter load emulation stopped at dev station
- Application gracefully stopped at {{ prod }}
curl --request POST http://{{ prod }}:8080/dbo/actuator/shutdown
rm -rf dbo-db
- Free HDD space? Free RAM?
- How many JVMs running?
- What DBMS used for application?
- What JVM version used for application? What are the parameters, properties and arguments used?
- How many Docker containers are running? What images used?
- What are the
health
indicator for application? - What is the application uptime?
- What is the CPU usage for application?
- How many http requests servlet container handled by different URLs?
- How many http sessions are active?
- What is the current
system load average
?
- On-heap and off-heap architectures
- GC algorithms
- Memory structures for typical GCs
- Creating
- Analysing
- Memory parameters tuning
- Analyse metrics with Prometheus
- Heap dump analysing
- Add new metrics to checklist by tier: JVM
- Given workload
- Analyse metrics with Prometheus
- Analyse remote heap dump
- Make issue hypothesis report and resolving plan
- Leaks
- OOME for different generations
- stop-the-world problem
- GC trade-off for latency and thoughput
- GC statistics monitoring
- New metrics to checklist by tier: JVM
- Given workload tool and test plan
- Analyse GC settings
- Analyse GC statistics with Prometheus
- Make issue hypothesis report and resolving plan
jcmd <pid> GC.heap_dump /tmp/dump.hprof
jmap -dump:live,format=b,file=/tmp/dump.hprof <pid>
- Threads
- Sheduler and preemtive concurrency
- Sheduling overhead
- Green and native threads
- Thread states
- Types of blocking/waiting
- Making thread dump and analysing with IDE
- Making thread dump and analysing with Profiler
- Monitoring threads online with local JMX Profiler
- Analyse thread statistics with Prometheus
- Thread pooling patterns
- Threading patterns for connections
- Threading patterns for logic processing
- Data access concurrency architectures
- Cooperative concurrency application arhitecture
- Paralllism issues and patterns
- Concurrency issues and patterns
- New metrics to checklist by tier: JVM
- Given workload
- Analyse thread statistics with Prometheus
- Make issue hypothesis report and resolving plan
- Syncronous IO concept
- Building blocks
- Sync remote call implementation
- Encoding
- Buffering
- Blocking for user data
- Excessive IO classes objects allocation
- Closing resources
- Pooling resources
- Given workload
- Analyse IO operations with Prometheus and logs
- Make issue hypothesis report and resolving plan
- Asyncronous IO concept
- NIO building blocks
- Async remote call implementation
- Code complexity
- Error handling
- Response time
- New metrics to checklist by tier: JVM, OS
- Given workload
- Analyse IO operations with Prometheus and logs
- Make issue hypothesis report and resolving plan
- JDBC API
- Driver types
- Prefetching tuning
- Prepared statements
- Batch operattions
- Transactions
- Isolation levels
- Connection pools
- Database CRUD implementaion with low-level JDBC API
- Database CRUD implementaion with Spring JDBC Template
- Given workload
- Analyse DB operations
- Make issue hypothesis report and resolving plan
- JPA API
- Caching levels
- JPA transactions architecture
- Spring Data JPA
- Transaction management
- Database CRUD implementaion with Spring Data JPA
- New metrics to checklist by tier: JPA, JVM
- Given workload
- Analyse DB operations
- Make issue hypothesis report and resolving plan
- Value taken
- Process Improvement Actions
- Training Improvement Actions
- Docker overview
- Docker containers
- Docker images
- Image provisioning and repositories
- Application containerization
- Configurating container and resource limits
- Running and monitoring container
- Memory issues and patterns
- Disk I/O issues and patterns
- New metrics to checklist by tier: JVM, OS
- Given workload
- Modify container configuration with K8s сonfig
- Analyse system metrics with Prometheus
- Make issue hypothesis report and resolving plan
- Why caches?
- Caching architecture: levels
- Caching proxy
- Java Cache API
- Spring application caching
- JPA cache levels
- DB caches
- Cold start
- Hit statistics
- Cache resetting and inconsistency
- New metrics to checklist by tier: caches
- Given workload
- Analyse application caches configuration
- Analyse caches statisitcs
- Make issue hypothesis report and resolving plan
- Black-box approach
- Load test structure
- Load tests suite
- Metrics to analyse
- Load
- Stress
- Spike
- Redundancy
- Agent architecture
- Test plan
- Configuring reports
- Running workload
- Report analysis
- Congiuring workload plan
- Running workload
- Analysing reports
- Issue hypothesis
- Java logging libraries hell
- SLF4J and Logback overview
- Logging architecture
- Application configuration
- Configuring application local logging
- Distributed logging collection architecture with ELK stack
- Application configuration
- Searching and analysing logs
- Configuring application distributed logging
- Given configuration
- Load tests run
- Analysing logs with ELK
- Prometheus architecture overview
- Metrics sources and agents
- Analysing monitoring dashboards and alerts
- New metrics checklist by tier: system and OS
- Configuring hardware metrics dasboard and alerts with Prometheus
- Given configuration
- Load tests run
- Analysing metrics and alerts with Prometheus
- New metrics to checklist by tier: JVM
- Configuring JVM through JMX metrics with Prometheus
- Given configuration
- Load tests run
- Analysing metrics and alerts with Prometheus
- Make issue hypothesis report and resolving plan
- DB request processing
- DB execution plan
- Constraints
- Indexes
- Transactions implementation architectures
- "Vacuum" side effects
- Profiling DB request with explain
- New metrics to checklist by tier: DBMS, OS
- Given workload
- Analyse DB schema
- Analyse requests profiles
- Make issue hypothesis report and resolving plan
Typical CI/CD pipeline overview (1.5)*
- Distributed systems: Seasons in the abbys
- CAP thesis
- Data storage architectures overview in CAP terms
- Why microservices?
- Data encapsulation
- Gateway
- Services discovery
- Data duplication
- Distributed transactions
- Monitoring patterns
- Tracing patterns
- Business operation in microservices environment tracing
- Given workload
- Analyse distributed architecture
- Analyse call trace
- Play "hell monkey"
- Make issue hypothesis report and resolving plan