Monitoring and Observability Engineer

Оплата: По договоренности
Full-time
Удаленно

This role involves designing, implementing, and managing comprehensive monitoring solutions using Prometheus, Grafana, SNMP-Exporter, Streaming Telemetry, OpenTelemetry, and other related technologies.


Responsibilities

- Design, implement, and manage Prometheus-based monitoring solutions, including configurations and alert rules.

- Develop and maintain interactive and visually appealing Grafana dashboards.

- Configure SNMP modules/jobs to scrape SNMP metrics for different network technologies in a very optimized way.

- Strong knowledge of Git to be able to clone working branches, develop, and commit to the main branch. Or other approaches, but show a strong hold on Git usage.

- Identify and onboard new metrics from various systems and applications, developing data pipelines for metrics collection and storage.

- Optimize and scale monitoring environments to handle large volumes of metrics and ensure comprehensive monitoring coverage.

- Implement and manage Streaming Telemetry solutions for real-time data collection and monitoring.

- Integrate and manage OpenTelemetry for comprehensive tracing and observability across services.

- Troubleshoot and resolve issues related to data collection, monitoring configurations, and dashboard performance.

- Ensure proper instrumentation of applications and infrastructure with DevOps, development, and operations teams.

- Document configurations, procedures, and provide training to team members and stakeholders.

 

Skills

- Familiarity with network monitoring tools and practices.

- Extensive experience with Prometheus and related technologies (Alertmanager, Pushgateway, etc.).

- Strong knowledge of time-series databases and monitoring concepts.

- Proficiency in writing Prometheus queries (PromQL).

- Strong experience with Grafana and its ecosystem.

- Proficiency in creating and managing Grafana dashboards and panels.

- Knowledge of data visualization principles and best practices.

- Familiarity with monitoring and observability tools and practices.

- Strong knowledge of SNMP protocols and network device management.

- Experience with SNMP-Exporter and its integration with Prometheus.

- Strong in SNMP module creation and scrape congas for various network technologies.

- Strong Git experience.

- Strong understanding of metrics and monitoring concepts.

- Experience with metrics collection tools (Prometheus, Telegraf, Collectd, etc.).

- Experience with Streaming Telemetry solutions for real-time monitoring.

- Experience with OpenTelemetry for tracing and observability.

- Familiarity with Linux/Unix systems and scripting languages (Bash, Python).

- Experience with containerization and orchestration tools (Docker, Kubernetes).

 

Qualification 

- Bachelor’s degree in Computer Science, Engineering, or related. 

- 5+ years of experience in monitoring and observability roles.

- Proficiency in tools like Prometheus, Grafana, PromQL, Alertmanager, Alert Framework, GitHub, SNMP-exporter, Streaming-Telemetry, Otel.

- Strong coding and scripting skills.

- Excellent problem-solving abilities and attention to detail.

- Strong communication and teamwork skills.