← В ленту ![]()
Senior
Регистрация: 28.05.2026
Георгий Каляев
Специализация: Performance Engineer
— Performance Engineer with 7+ years validating enterprise and high-load distributed systems in fintech, asset management, banking, insurance, e-commerce, commodity exchange, and regulated government sectors.
— I work at the intersection of quality, performance, and platform engineering: workload modelling, LTM sign-off, load campaigns, RCA, SLA risk forecasting, and actionable recommendations for dev, DevOps/SRE, and architecture - before peak traffic or production incidents.
— Full lifecycle: statistics and load models; LTM, profiles, success and acceptance criteria; load, stress, volume, soak, max-performance, and performance regression testing; NFR validation (throughput, p95/p99, SLA); reports and peak-readiness sign-off; integration into release cycles.
— Capacity planning: growth models, degradation forecast, partitioning and architecture recommendations backed by repeated load tests.
— Observability on LT stands: Grafana, Telegraf, InfluxDB, Prometheus, ELK, AppDynamics, Splunk, Zabbix.
— JVM profiling (VisualVM, GC, threads).
— Stubs: Spring Boot, MockServer, Axis2/SOAP.
— NGINX with GOST and Lua in regulated environments.
— Distributed LT on Kubernetes (Kangal + JMeter).
— Automation: Python, Java; GitLab CI, Jenkins.
Key results:
— Asset management cloud (2025–present): PostgreSQL capacity planning — 5 growth models (0→100M+ rows), ~90% forecast accuracy; INSERT degradation up to 8.3× (2.7 s→22–77 s); SLA breach risk ~11 months → recommended partitioning/indexes → dev implemented → INSERT 20–70 s→≤1 s, validated to 2B rows; capacity ~30 years vs ~11 months before fix.
— Introduced Kangal + JMeter on K8s as team standard; 15+ REST microservices per release cycle.
— Batch/K8s: throughput capped by 1-min cron, not pod count; Kafka not bottleneck — recommended event-driven design.
— Open-source jmeter-load-profile-checker: step analysis ~5–6 h→~30 min (~90%).
— IBS (2020–2025), Senior Performance Engineer, lead ~5 engineers. Team Player 2023, Project Driver 2022.
— E-commerce: Gatling, 5000+ users; night production runs; ~900→2600+ orders/h (+189%) through seasonal peak without outages.
— Exchange: ~10,000 WebSocket msg/s (STOMP), RabbitMQ.
— Guidewire insurance: LTM, PREPROD→LT stand→prod extrapolation; JMeter, LoadIT, AppDynamics, Splunk.
— SAP ERP/BW/Fiori (LoadRunner).
— Leroy Merlin: Gatling, 2–3 years DB growth model.
— Government GIS: NGINX GOST/Lua, regulated SLA; published article on NGINX + JMeter with gov certificates.
— VisualVM: thread leak on soak — fix before release.
— ScriptMaster / Alfa-Bank (2019–2020): FSSP max-performance testing; LoadRunner + IBM MQ ~6000 msg/s; SOAP stubs Java/Axis2; HornetQ bash monitoring — queues stabilized after memory tuning; Oracle AWR ~40% improvement; JVM/GC analysis.
— Currently completing mentorship in test automation to broaden functional and automation coverage.
— English fluent, Russian native.
— Open to relocate.
Java Тестирование производительности QA Python Load Testing Apache JMeter ELK Redis SQL Jenkins Jira Zabbix GitLab REST CI/CD methodologies Bash InfluxDB Bamboo ClickHouse SoapUI Scala MySQL Kubernetes PostgreSQL Postman IntelliJ IDEA Linux Nginx SAP Splunk Grafana GraphQL Confluence gRPC Kafka WebSockets LoadRunner Oracle SOAP Prometheus Git JBoss Docker JDBC RabbitMQ
— Performance Engineer with 7+ years validating enterprise and high-load distributed systems in fintech, asset management, banking, insurance, e-commerce, commodity exchange, and regulated government sectors.
— I work at the intersection of quality, performance, and platform engineering: workload modelling, LTM sign-off, load campaigns, RCA, SLA risk forecasting, and actionable recommendations for dev, DevOps/SRE, and architecture - before peak traffic or production incidents.
— Full lifecycle: statistics and load models; LTM, profiles, success and acceptance criteria; load, stress, volume, soak, max-performance, and performance regression testing; NFR validation (throughput, p95/p99, SLA); reports and peak-readiness sign-off; integration into release cycles.
— Capacity planning: growth models, degradation forecast, partitioning and architecture recommendations backed by repeated load tests.
— Observability on LT stands: Grafana, Telegraf, InfluxDB, Prometheus, ELK, AppDynamics, Splunk, Zabbix.
— JVM profiling (VisualVM, GC, threads).
— Stubs: Spring Boot, MockServer, Axis2/SOAP.
— NGINX with GOST and Lua in regulated environments.
— Distributed LT on Kubernetes (Kangal + JMeter).
— Automation: Python, Java; GitLab CI, Jenkins.
Key results:
— Asset management cloud (2025–present): PostgreSQL capacity planning — 5 growth models (0→100M+ rows), ~90% forecast accuracy; INSERT degradation up to 8.3× (2.7 s→22–77 s); SLA breach risk ~11 months → recommended partitioning/indexes → dev implemented → INSERT 20–70 s→≤1 s, validated to 2B rows; capacity ~30 years vs ~11 months before fix.
— Introduced Kangal + JMeter on K8s as team standard; 15+ REST microservices per release cycle.
— Batch/K8s: throughput capped by 1-min cron, not pod count; Kafka not bottleneck — recommended event-driven design.
— Open-source jmeter-load-profile-checker: step analysis ~5–6 h→~30 min (~90%).
— IBS (2020–2025), Senior Performance Engineer, lead ~5 engineers. Team Player 2023, Project Driver 2022.
— E-commerce: Gatling, 5000+ users; night production runs; ~900→2600+ orders/h (+189%) through seasonal peak without outages.
— Exchange: ~10,000 WebSocket msg/s (STOMP), RabbitMQ.
— Guidewire insurance: LTM, PREPROD→LT stand→prod extrapolation; JMeter, LoadIT, AppDynamics, Splunk.
— SAP ERP/BW/Fiori (LoadRunner).
— Leroy Merlin: Gatling, 2–3 years DB growth model.
— Government GIS: NGINX GOST/Lua, regulated SLA; published article on NGINX + JMeter with gov certificates.
— VisualVM: thread leak on soak — fix before release.
— ScriptMaster / Alfa-Bank (2019–2020): FSSP max-performance testing; LoadRunner + IBM MQ ~6000 msg/s; SOAP stubs Java/Axis2; HornetQ bash monitoring — queues stabilized after memory tuning; Oracle AWR ~40% improvement; JVM/GC analysis.
— Currently completing mentorship in test automation to broaden functional and automation coverage.
— English fluent, Russian native.
— Open to relocate.
Java Тестирование производительности QA Python Load Testing Apache JMeter ELK Redis SQL Jenkins Jira Zabbix GitLab REST CI/CD methodologies Bash InfluxDB Bamboo ClickHouse SoapUI Scala MySQL Kubernetes PostgreSQL Postman IntelliJ IDEA Linux Nginx SAP Splunk Grafana GraphQL Confluence gRPC Kafka WebSockets LoadRunner Oracle SOAP Prometheus Git JBoss Docker JDBC RabbitMQ
Скиллы
Jmeter
LoadRunner
Gatling
K6
Locust
Kangal
Kubernetes
PostgreSQL
Kafka
RabbitMQ
Redis
Oracle
SAP
Nginx
Grafana
Prometheus
ELK
Java
Python
SQL
Bash
Опыт работы
Principal Development Engineer
с 01.2025 - По настоящий момент |First Asset Management
Grafana, Prometheus, Zabbix, ELK/OpenSearch, HAProxy, Redis
Russian asset management company (mutual funds, ETFs, discretionary portfolios; very similar to Vanguard).
● Proposed PostgreSQL table partitioning and index tuning based on load-test evidence; the development team implemented the changes — INSERT latency dropped from 20–70 s to ~1 s or less (validated to 2B rows), eliminating the scalability bottleneck projected to breach SLA in ~11 months at current growth.
● PostgreSQL capacity planning: built 5 growth models (0 → 100M+ rows, ~19 GB data, ~12 GB indexes at peak); ~90% forecast accuracy; as data volume grew, INSERT latency increased from ~2.7 s to 22–77 s (up to 8.3×), with up to 800 MB disk read per operation.
● Introduced and standardized Kangal + JMeter on Kubernetes as the default load-testing platform: performance-tested 15+ REST microservices across release cycles — on-demand load generators in an isolated namespace, horizontal scaling of JMeter workers, automatic teardown after runs; distributed campaigns without dedicated hardware idle between test windows; adopted by the team for all release-cycle runs.
● Batch processing in K8s: throughput limited by 1-minute cron batch, not pod count (1 vs 3 pods — no gain); Kafka lag analysis — not the bottleneck; recommended event-driven / worker pool design.
● Built and published jmeter-load-profile-checker (GitHub) to validate JMeter step profiles (plateau without ramp-down); reduced step-profile analysis from ~5–6 hours to ~30 minutes per campaign.
● NFR validation (p95/p99 latency, throughput); correlation with Grafana, Prometheus, Zabbix, ELK/OpenSearch, HAProxy, Redis.
Senior Performance Test Engineer
05.2020 - 01.2025 |IBS
SAP, RabbitMQ, STOMP, Java, SBT, PostgreSQL, Kubernetes, Grafana
Large IT consulting and system integrator (very similar to EPAM or Accenture). Embedded in client teams across banking, insurance, retail/e-commerce, exchange, and government.
● Led a load-testing team of ~5 engineers: campaign planning, mentoring, onboarding, technical interviews, and hiring.
● Performed performance testing of SAP ERP, HANA, Fiori and SAP BW systems using SAP GUI and SAP Web protocols, validating transactional and analytical workloads under concurrent user load.
● Designed and executed high-load performance scenarios for Guidewire-based insurance systems, simulating peak business-critical workflows and monitoring via AppDynamics and Splunk.
● Simulated end-to-end trading workflows under high-frequency market conditions, processing up to 10,000 WebSocket messages per second and validating resilience of message-driven architecture (RabbitMQ, STOMP).
● Developed Gatling-based high-load testing frameworks for Cooper (SberMarket), modelling B2B/B2C and mobile order flows (5000+ users in load profile). Contributed to increasing key scenario throughput from ~900 to 2600+ orders/hour (+189%) while ensuring stable platform performance during seasonal peak traffic without degradation under load; night runs on production (11:00 PM – 3:00 AM) where no dedicated performance environment existed.
● Designed and executed Gatling-based high-load tests for Leroy Merlin retail systems (Java, SBT), simulating 2–3 years of production database growth and Redis-intensive workloads. Monitored PostgreSQL and Kubernetes infrastructure via Grafana, validating platform scalability and performance stability under large-scale data expansion.
● Performed performance validation for a national financial system in a highly regulated government environment. Configured NGINX with GOST encryption and implemented observability stack (Telegraf, InfluxDB, Grafana).
● Led technical interviews and contributed to hiring decisions, mentoring and onboarding new performance engineers.
● Defined and validated non-functional requirements (NFRs), including latency thresholds (p95/p99), throughput targets, and scalability KPIs.
● JVM / VisualVM: during soak testing, remote VisualVM monitoring showed threads not shutting down — live thread count grew steadily under load (not visible in standard dashboards); reported root cause to development; fix applied right after load-test findings. Also analyzed GC, heap utilization, and thread contention.
Performance Test Engineer
07.2019 - 05.2020 |ScriptMaster
LoadRunner, Oracle, SQL
IT integrator on Alfa-Bank projects (major private bank; very similar to Citigroup). Core banking and legal workflows for 30M+ retail clients.
● Delivered performance validation for internal banking systems used by 30+ million clients, ensuring stability under peak operational load.
● Created and maintained LoadRunner scripts simulating up to 5000 concurrent users across high-load legal request processing workflows.
● Simulated large-scale IBM MQ message flows to assess system resilience under peak transaction volumes.
● HornetQ (JBoss): internal message queues were not monitored by the platform — built a bash script to sample queue depth and timestamp during load tests, export to CSV, and summarize in Excel; showed backlog growth under load; recommended JBoss HornetQ memory limit increase (address/global-max-size) — queues stabilized after rollout.
● Optimized Oracle SQL queries using execution plan analysis and AWR reports, reducing average DB response time by ~25% and improving overall throughput by ~15%.
● Monitored JVM performance in JBoss environments, analyzing garbage collection impact and thread pool behavior.
Образование
Business Informatics in Economics
2015 - 2019
Moscow International Academy of Higher Education
Information Systems (Бакалавр)
2014 - 2018
Plekhanov Russian University of Economics
Языки
РусскийРоднойАнглийскийПродвинутый
