Valerii Zhuk
Портфолио
EPAM Systems Inc.
- Software engineering. - Customer comminucation. - Hadoop infrastructure support. - Interns interviewing and internal projects mentoring. - Potential architecture improvements prototyping. 1. Data lake development. Helthcare reports reporting platform. Technologies: MongoDB, Apache Spark(Core), Scala, AWS S3, AWS EMR, TeamCity, Sumologic. 2. Building end-to-end data pipeline: ingestion, transformation, reporting on top of Apache Hive. Technologies: Apache Hive, MapReduce, HDFS, Python, Bash, HiveQL, Nexus, Jenkins. 3. Complex data streaming platform, handling telemetry data. Technologies: Apache Hive, Azure DataLake, Java, Apache Spark(Spark Dataset API, Spark Streaming), Redis, Azure HDInsight. 4. ETL pipeline development Technologies: Apache Impala, Apache Spark(Spark Dataset API), Apache Kafka, Parquet, Avro, StreamSets, Airflow, Cloudera. 5. Serverless image recognition pipeline development Technologies: AWS S3, AWS Lambda, AWS EC2, Docker, Python, ANN.
Grid Dynamics
1. Data platform for marketing data. - Design and implement data pipelines. - ETL Pipelines and Spark Jobs optimization. - Fixing bugs. 2. Cloud Data platform for manufacturer. - Architecting and documenting features. - Features development. - Integrating ML models into data pipelines.
DataArt
- Software engineering: - Bug fixes. - New sources ingestion. - ETL pipelines development. - Participating in architecture design and prototyping.