Сергей Протасов
Портфолио
Deeplay
● Led the development of an enterprise scale ETL system based on Apache Airflow, Kubernetes jobs, cronjobs, and deployments with Data Warehouse, Data Lake based on ClickHouse, Kafka and Minio. ● Implemented a new Big Data ETL pipeline as a team leader, utilizing Flink, pyFlink, Apache Kafka, Google Protobufs, GRPC, and ClickHouse thus consuming a big data stream from a data provider. ● Led a team of 7 Python ETL engineer s as part of a larger team of 30 ETL engineers, including Java, TypeScript, Alerting and BI support teams. Collaborated with the team lead, tech lead/CTO, DBA, and DevOps team. ● Facilitated team growth through performance reviews, skill development sessions , and code reviews. ● Influenced agile development methodologies, including sprint planning, retrospectives, daily meetings, and backlog grooming, to ensure timely delivery of ETL projects. ● Conducted research and evaluated new technologies, resulting in the successful adoption of new versions of Airflow 2, Pandas, and a ClickHouse native tool that enhanced performance by 100% (for complex DAGs). ● Trained 6 new team members on the ETL system, leading to a reduction in development time by 500%. ● Developed and maintained 4 large, scalable parsers for complex data transformation as team leader. ● Implemented and maintained 15 DAGs, resulting in a 5% increase in department revenue per year.
Zuykov and Partners company
● Optimized website performance by implementing caching techniques, resulting in 3x faster website access and meeting Google Page Speed metrics. ● Improved search engine response time by up to 10x through the creation of views logic and partitioning of search result tables into 42 partitions. ● Reduced deployment time from 1 hour to less than 5 minutes by implementing CI/CD for the project. ● Increased readability and decreased troubleshooting time from days to less than 1 hour by integrating ElasticSearch, Logstash, and Kibana (ELK). Developed a backend using Flask, MongoDB, ElasticSearch, and Redis with a remote team of 5 developers. ● Built the backend from scratch, ensuring robust and scalable architecture. ● Implemented Continuous Integration and Continuous Deployment (CI/CD) practices for the project. ● Integrated metrics and monitoring for the backend.
RAFT
1.OpenAPI Generator System Development. 1.Developed a system from scratch using APIs and worker instances to interact with OpenAPI SDK Generator. 2.Reduced developers' wait time for SDK generation from hours to instant on demand generation. 3.Engineered end to end CI/CD CircleCI scripts from scratch f or swift deployment on Google Kubernetes Engine (GKE). 2.Google Dataflow Pipeline and Framework Development. 1.Developed a Google Dataflow pipeline with Apache Beam to parse and store XML CDA files in Google BigQuery for optimized data processing and analysis. 3.Apache Airflow development. 1.Led the ground up development of a new Apache Airflow sub framework for ETL, an alytics and business intelligence departments. 2.Created new Operators, Hooks, Sensors, and Data Quality / Data Alerts tools for Apache Airflow, enhancing code quality and delivery efficiency. 3.Provided comprehensive documentation and training to team members for the new framework. 4.Conducted a large scale migration of 100 Apache Airflow DAGs from MS SQL and ClickHouse to Greenplum to meet the deadline for MS SQL license revocation. 4.AsyncIO microservice for video files parsing. 1.Developed features for a microservi ce project using asyncio, aiohttp, and PostgreSQL for video edition. 2.Contributed to the development and maintenance of the microservice project, ensuring high quality deliverables and smooth operation.