Георгий Каляев

Senior

Регистрация: 28.05.2026

Специализация: Performance Engineer

— Performance Engineer with 7+ years validating enterprise and high-load distributed systems in fintech, asset management, banking, insurance, e-commerce, commodity exchange, and regulated government sectors. — I work at the intersection of quality, performance, and platform engineering: workload modelling, LTM sign-off, load campaigns, RCA, SLA risk forecasting, and actionable recommendations for dev, DevOps/SRE, and architecture - before peak traffic or production incidents. — Full lifecycle: statistics and load models; LTM, profiles, success and acceptance criteria; load, stress, volume, soak, max-performance, and performance regression testing; NFR validation (throughput, p95/p99, SLA); reports and peak-readiness sign-off; integration into release cycles. — Capacity planning: growth models, degradation forecast, partitioning and architecture recommendations backed by repeated load tests. — Observability on LT stands: Grafana, Telegraf, InfluxDB, Prometheus, ELK, AppDynamics, Splunk, Zabbix. — JVM profiling (VisualVM, GC, threads). — Stubs: Spring Boot, MockServer, Axis2/SOAP. — NGINX with GOST and Lua in regulated environments. — Distributed LT on Kubernetes (Kangal + JMeter). — Automation: Python, Java; GitLab CI, Jenkins. Key results: — Asset management cloud (2025–present): PostgreSQL capacity planning — 5 growth models (0→100M+ rows), ~90% forecast accuracy; INSERT degradation up to 8.3× (2.7 s→22–77 s); SLA breach risk ~11 months → recommended partitioning/indexes → dev implemented → INSERT 20–70 s→≤1 s, validated to 2B rows; capacity ~30 years vs ~11 months before fix. — Introduced Kangal + JMeter on K8s as team standard; 15+ REST microservices per release cycle. — Batch/K8s: throughput capped by 1-min cron, not pod count; Kafka not bottleneck — recommended event-driven design. — Open-source jmeter-load-profile-checker: step analysis ~5–6 h→~30 min (~90%). — IBS (2020–2025), Senior Performance Engineer, lead ~5 engineers. Team Player 2023, Project Driver 2022. — E-commerce: Gatling, 5000+ users; night production runs; ~900→2600+ orders/h (+189%) through seasonal peak without outages. — Exchange: ~10,000 WebSocket msg/s (STOMP), RabbitMQ. — Guidewire insurance: LTM, PREPROD→LT stand→prod extrapolation; JMeter, LoadIT, AppDynamics, Splunk. — SAP ERP/BW/Fiori (LoadRunner). — Leroy Merlin: Gatling, 2–3 years DB growth model. — Government GIS: NGINX GOST/Lua, regulated SLA; published article on NGINX + JMeter with gov certificates. — VisualVM: thread leak on soak — fix before release. — ScriptMaster / Alfa-Bank (2019–2020): FSSP max-performance testing; LoadRunner + IBM MQ ~6000 msg/s; SOAP stubs Java/Axis2; HornetQ bash monitoring — queues stabilized after memory tuning; Oracle AWR ~40% improvement; JVM/GC analysis. — Currently completing mentorship in test automation to broaden functional and automation coverage. — English fluent, Russian native. — Open to relocate.

Скиллы

Jmeter

LoadRunner

Gatling

Locust

Kangal

Kubernetes

PostgreSQL

Kafka

RabbitMQ

Redis

Oracle

SAP

Nginx

Grafana

Prometheus

ELK

Java

Python

SQL

Bash

Опыт работы

Principal Development Engineer

с 01.2025 - По настоящий момент |First Asset Management

Grafana, Prometheus, Zabbix, ELK/OpenSearch, HAProxy, Redis

Russian asset management company (mutual funds, ETFs, discretionary portfolios; very similar to Vanguard). ● Proposed PostgreSQL table partitioning and index tuning based on load-test evidence; the development team implemented the changes — INSERT latency dropped from 20–70 s to ~1 s or less (validated to 2B rows), eliminating the scalability bottleneck projected to breach SLA in ~11 months at current growth. ● PostgreSQL capacity planning: built 5 growth models (0 → 100M+ rows, ~19 GB data, ~12 GB indexes at peak); ~90% forecast accuracy; as data volume grew, INSERT latency increased from ~2.7 s to 22–77 s (up to 8.3×), with up to 800 MB disk read per operation. ● Introduced and standardized Kangal + JMeter on Kubernetes as the default load-testing platform: performance-tested 15+ REST microservices across release cycles — on-demand load generators in an isolated namespace, horizontal scaling of JMeter workers, automatic teardown after runs; distributed campaigns without dedicated hardware idle between test windows; adopted by the team for all release-cycle runs. ● Batch processing in K8s: throughput limited by 1-minute cron batch, not pod count (1 vs 3 pods — no gain); Kafka lag analysis — not the bottleneck; recommended event-driven / worker pool design. ● Built and published jmeter-load-profile-checker (GitHub) to validate JMeter step profiles (plateau without ramp-down); reduced step-profile analysis from ~5–6 hours to ~30 minutes per campaign. ● NFR validation (p95/p99 latency, throughput); correlation with Grafana, Prometheus, Zabbix, ELK/OpenSearch, HAProxy, Redis.

Senior Performance Test Engineer

05.2020 - 01.2025 |IBS

SAP, RabbitMQ, STOMP, Java, SBT, PostgreSQL, Kubernetes, Grafana

Large IT consulting and system integrator (very similar to EPAM or Accenture). Embedded in client teams across banking, insurance, retail/e-commerce, exchange, and government. ● Led a load-testing team of ~5 engineers: campaign planning, mentoring, onboarding, technical interviews, and hiring. ● Performed performance testing of SAP ERP, HANA, Fiori and SAP BW systems using SAP GUI and SAP Web protocols, validating transactional and analytical workloads under concurrent user load. ● Designed and executed high-load performance scenarios for Guidewire-based insurance systems, simulating peak business-critical workflows and monitoring via AppDynamics and Splunk. ● Simulated end-to-end trading workflows under high-frequency market conditions, processing up to 10,000 WebSocket messages per second and validating resilience of message-driven architecture (RabbitMQ, STOMP). ● Developed Gatling-based high-load testing frameworks for Cooper (SberMarket), modelling B2B/B2C and mobile order flows (5000+ users in load profile). Contributed to increasing key scenario throughput from ~900 to 2600+ orders/hour (+189%) while ensuring stable platform performance during seasonal peak traffic without degradation under load; night runs on production (11:00 PM – 3:00 AM) where no dedicated performance environment existed. ● Designed and executed Gatling-based high-load tests for Leroy Merlin retail systems (Java, SBT), simulating 2–3 years of production database growth and Redis-intensive workloads. Monitored PostgreSQL and Kubernetes infrastructure via Grafana, validating platform scalability and performance stability under large-scale data expansion. ● Performed performance validation for a national financial system in a highly regulated government environment. Configured NGINX with GOST encryption and implemented observability stack (Telegraf, InfluxDB, Grafana). ● Led technical interviews and contributed to hiring decisions, mentoring and onboarding new performance engineers. ● Defined and validated non-functional requirements (NFRs), including latency thresholds (p95/p99), throughput targets, and scalability KPIs. ● JVM / VisualVM: during soak testing, remote VisualVM monitoring showed threads not shutting down — live thread count grew steadily under load (not visible in standard dashboards); reported root cause to development; fix applied right after load-test findings. Also analyzed GC, heap utilization, and thread contention.

Performance Test Engineer

07.2019 - 05.2020 |ScriptMaster

LoadRunner, Oracle, SQL

IT integrator on Alfa-Bank projects (major private bank; very similar to Citigroup). Core banking and legal workflows for 30M+ retail clients. ● Delivered performance validation for internal banking systems used by 30+ million clients, ensuring stability under peak operational load. ● Created and maintained LoadRunner scripts simulating up to 5000 concurrent users across high-load legal request processing workflows. ● Simulated large-scale IBM MQ message flows to assess system resilience under peak transaction volumes. ● HornetQ (JBoss): internal message queues were not monitored by the platform — built a bash script to sample queue depth and timestamp during load tests, export to CSV, and summarize in Excel; showed backlog growth under load; recommended JBoss HornetQ memory limit increase (address/global-max-size) — queues stabilized after rollout. ● Optimized Oracle SQL queries using execution plan analysis and AWR reports, reducing average DB response time by ~25% and improving overall throughput by ~15%. ● Monitored JVM performance in JBoss environments, analyzing garbage collection impact and thread pool behavior.

Образование

Business Informatics in Economics

2015 - 2019

Moscow International Academy of Higher Education

Information Systems (Бакалавр)

2014 - 2018

Plekhanov Russian University of Economics

Языки

РусскийРоднойАнглийскийПродвинутый