Incident Manager

Оплата: По договоренности
Офис
Удаленно
Full-time
Постоянная работа
Looking for a Incident Manager for working on a trading product. Requirements: - 1+ years designing, analyzing, troubleshooting, supporting and resolving issues in a multi-tiered application architecture, especially service-oriented and micro-services architecture requiring 24x7 availability; - Experience with SQL queries; - Basic knowledge of databases: Oracle (Pl/SQL) and/or PostgreSQL; - Basic Linux knowledge (awk, sed, bash, cat, grep, etc.); - Understanding AWS (VPC, EC2, ECS, Route53, S3); - Version control systems: GIT; - Basic knowledge of networks; - Good analytical\troubleshooting skills. Will be a plus: - Linux system, Web servers (Nginx, Tomcat); - Experience with DevOps tools (Docker, Jenkins, Gitlab-CI, Terraform, etc.); - Understanding of JVM’s configuration; - Understanding of REST API, gRPC; - Experience in high-loaded applications implementation; - Experience as a Software Engineer. Financial, Forex, gaming industries preferable; - Experience working with JIRA; - Familiarity with Logstash, Kibana, Elastic Search technologies; - Familiarity with Zabbix or Prometheus; - Understanding how the services work with message brokers (Kafka, SQS/SNS, ESB); - Scripting languages: Bash, Python. Tasks and responsibilities: - To monitor the operation of reporting systems in production, solve current problems and work on improving the operation of systems (find errors in logs, performance drawdowns, detect problems in the interaction of services, analyze application performance metrics and system metrics for host resources on which the application is deployed and create tasks to the development team to fix the problems found). Resolving incidents in the application, analyzing the reasons, organizing interaction with other teams to restore the smooth operation of the application; managing the build, release and customization of the app in production; - Deploying, automating, maintaining, and managing an AWS cloud based environment for availability, performance, scalability, and security; management of dev and QA environments; - Analyse and make recommendations regarding technology improvements, upgrades and modifications.