← В ленту
senior
Регистрация: 11.05.2022

Valerii Zhuk

Специализация: Data Engineer
Area of interests: - Data engineering. - Software development. - (Near) real time analysis. - Running ML models in production.
Area of interests: - Data engineering. - Software development. - (Near) real time analysis. - Running ML models in production.

Портфолио

EPAM Systems Inc.

- Software engineering. - Customer comminucation. - Hadoop infrastructure support. - Interns interviewing and internal projects mentoring. - Potential architecture improvements prototyping. 1. Data lake development. Helthcare reports reporting platform. Technologies: MongoDB, Apache Spark(Core), Scala, AWS S3, AWS EMR, TeamCity, Sumologic. 2. Building end-to-end data pipeline: ingestion, transformation, reporting on top of Apache Hive. Technologies: Apache Hive, MapReduce, HDFS, Python, Bash, HiveQL, Nexus, Jenkins. 3. Complex data streaming platform, handling telemetry data. Technologies: Apache Hive, Azure DataLake, Java, Apache Spark(Spark Dataset API, Spark Streaming), Redis, Azure HDInsight. 4. ETL pipeline development Technologies: Apache Impala, Apache Spark(Spark Dataset API), Apache Kafka, Parquet, Avro, StreamSets, Airflow, Cloudera. 5. Serverless image recognition pipeline development Technologies: AWS S3, AWS Lambda, AWS EC2, Docker, Python, ANN.

Grid Dynamics

1. Data platform for marketing data. - Design and implement data pipelines. - ETL Pipelines and Spark Jobs optimization. - Fixing bugs. 2. Cloud Data platform for manufacturer. - Architecting and documenting features. - Features development. - Integrating ML models into data pipelines.

DataArt

- Software engineering: - Bug fixes. - New sources ingestion. - ETL pipelines development. - Participating in architecture design and prototyping.

Скиллы

Java
SCALA
Big Data
Git
Apache Hive
Linux
Bash
SQL
Apache Spark
Spark Streaming
Apache Kafka
Jenkins
Apache Hadoop
Hortonworks
Cloudera
Avro
Parquet
JSON
Jupyter Notebooks
AWS S3
AWS EMR
AWS EC2
AWS Lambda
Azure Datalake
Azure HDInsight
Azure EventHubs
Airflow
Apache Cassandra

Опыт работы

Data Engineer
07.2021 - 05.2022 |Grid Dynamics
Python, Apache Spark, AWS EMR, AirFlow, SparkQL, AWS S3, Lambda, SQS, ECR, Snowflake
1. Data platform for marketing data. - Design and implement data pipelines. - ETL Pipelines and Spark Jobs optimization. - Fixing bugs. 2. Cloud Data platform for manufacturer. - Architecting and documenting features. - Features development. - Integrating ML models into data pipelines.
Data Engineer
06.2020 - 07.2021 |DataArt
.
- Software engineering: - Bug fixes. - New sources ingestion. - ETL pipelines development. - Participating in architecture design and prototyping.
Software engineer
10.2019 - 06.2020 |Perfect Art .Inc.
Python, Apache Hive, HiveQL, Apache Spark, Jupyter
Fuel efficiency analysis: - Dataflow analysis and optimization. - DataMarts creation. - Basic data quality autotests development (Deequ). - Existing Hive transformations optimization.
Software engineer
05.2019 - 10.2019 |DINS
Scala, Apache Spark, Apache Kafka, Postgres, Apache Impala, HDFS
Data platform - Software development(features, pipelines, modernization). - Supporting Hadoop cluster. - Service architecture refactoring.
Software engineer
07.2018 - 04.2019 |Nexign Systems Saint Petersburg
Kotlin(Java), Spring, Spring Boot, Postgres, Apache Cassandra, Apache Spark, Spark Streaming, Kafka Streams, Apache Hive
- Software engineering. - Product migration from traditional storages.
Software engineer
07.2015 - 07.2018 |EPAM Systems Inc. (Russia) Saint Petersburg
MongoDB, Apache Spark(Core), Scala, AWS S3, AWS EMR, TeamCity, Sumologic
- Software engineering. - Customer comminucation. - Hadoop infrastructure support. - Interns interviewing and internal projects mentoring. - Potential architecture improvements prototyping. 1. Data lake development. Helthcare reports reporting platform. Technologies: MongoDB, Apache Spark(Core), Scala, AWS S3, AWS EMR, TeamCity, Sumologic. 2. Building end-to-end data pipeline: ingestion, transformation, reporting on top of Apache Hive. Technologies: Apache Hive, MapReduce, HDFS, Python, Bash, HiveQL, Nexus, Jenkins. 3. Complex data streaming platform, handling telemetry data. Technologies: Apache Hive, Azure DataLake, Java, Apache Spark(Spark Dataset API, Spark Streaming), Redis, Azure HDInsight. 4. ETL pipeline development Technologies: Apache Impala, Apache Spark(Spark Dataset API), Apache Kafka, Parquet, Avro, StreamSets, Airflow, Cloudera. 5. Serverless image recognition pipeline development Technologies: AWS S3, AWS Lambda, AWS EC2, Docker, Python, ANN.

Образование

HDP Certified Administrator
По 2016
Certificates/Courses
Technical Cybernetics , Software engineering (Магистр)
По 2018
Peter the Great St. Petersburg Polytechnic University

Языки

АнглийскийВыше среднегоРусскийРодной