← В ленту
Регистрация: 22.05.2024

Скиллы

Python
Scala
Spark
Hadoop
AWS
SQL
RDBMS
DevOps Knowledge
GIT
HBase
HDFS
Hive
MapReduce
Pig
Yarn
Kafka
Zookeeper
Impala
Apache Sqoop
Apache Airflow
Hue
NiFi
Elasticsearch
Talend
SSIS
SSRS
Spark SQL
Data Frames
Data Sets
Kubernetes
Docker
Scikit-learn
Pandas
Matplotlib
MySQL
HTML/CSS
XML
JavaScript
JSON
UML Using MS Visio
Project Plan MS Project Plan
Power Builder
Tableau
Power BI
Amazon Web Services (AWS)
Azure
GCP
Data bricks
Visual Studio
Eclipse
Jenkins
GitHub
Snowflake

Опыт работы

Data Engineer / Analyst
с 01.2023 - По настоящий момент |BCBS
Spark, Kafka, Flink, SQL, Snowflake, Python, SnowSQL, Glue Studio, S3, Redshift, RDS, AWS, Amazon EMR, GitHub, Ansible, Hadoop, Elastic Map, Confidential RDS, Aurora DB, TKGS, Azure, GCP, Spring Boot, Docker, Kubernetes, AWS Code Pipeline, Microsoft Power BI, Power Pivot, Power View, CI/CD, Lambda, DynamoDB
● Designed and implemented scalable data pipelines using Spark, Kafka, and Flink for processing large volumes of streaming data. ● Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL. ● Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded it into S3, Redshift and RDS. ● Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift and S3. ● Management almost completely managed in GitHub or with Ansible. ● Managed Hadoop deployment in AWS cloud using s3 storage and Elastic Map Reduce. ● Configured an AWS Virtual Private Cloud (VPC), NACL, and Database Subnet Group for isolation of resources within the Confidential RDS and Aurora DB clusters. ● Demonstrated proficiency in deploying applications to various cloud platforms such as TKGS, Azure, GCP, and AWS, ensuring adaptability to different cloud environments and deployment strategies. ● Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake. ● Proficient in data analysis tools such as Excel, Python, R, and SQL for querying and manipulation. ● Built microservices using Spring Boot, deployed to Docker containers, and hosted in Kubernetes clusters. ● Created automated pipelines in AWS Code Pipeline to deploy Docker containers in ECS using S3. ● Designed and set up an Enterprise Data Lake to support various use cases including analytics, processing, storing, and reporting of rapidly changing data. ● Utilized Microsoft Power BI to design and develop interactive dashboards and reports, providing stakeholders with actionable insights into key performance indicators (KPIs) and business metrics. ● Leveraged Power BI and Power Pivot for data analysis prototypes, utilizing Power View and Power Map for effective report visualization. ● Implemented CI/CD pipelines with AWS for efficient deployment and optimized workspace and cluster configurations for performance. ● Designed and developed data management system using MySQL and Managed large datasets using Panda data frames and MySQL. ● Designed and developed security frameworks for fine-grained access control in AWS S3 using Lambda and DynamoDB. ● Utilized Well-planned streaming for real-time data processing and integrated Databricks with various ETL/Orchestration tools. ● Implemented machine learning algorithms in Python for predictive analytics and utilized AWS EMR for data transformation and movement. ● Prepared and delivered presentations to stakeholders summarizing A/B test methodologies, results, and actionable recommendations, demonstrating a strong understanding of the SDLC and project implementation methodology. ● Strong understanding of the Software Development Life Cycle and project implementation methodology.
Data Engineer / Data Analyst
05.2016 - 06.2021 |Spinix Solutions
ETL, Spark SQL, Azure Databricks, JSON, MongoDB, Spark, Kafka, Flink, CRM, Oracle SQL, PL/SQL, Azure PaaS, Zeo-Replication, Panda, MySQL, Azure Data Lake Storage
● Developed end-to-end Spark applications in Databricks and PySpark for data cleansing, transformations, and aggregations. ● Developed SPLUNK data models for data searches, reporting and dashboards. ● Experience in developing scalable & secure data pipelines for large datasets. ● Designed professional reports and dashboards in Excel and Power BI for management. ● Optimized Spark jobs for better performance and analyzed time-series data for trends. ● Managed and maintained Splunk environment with high volume ingestion from multiple custom devices and applications. The daily ingestion rate is 25Tb to 35Tb a day in logs in Splunk (compressed). This equaled about 50Tb to 60Tb in log rate ingestion before compression. ● Trouble shooted issues with ETL load (SSIS and Stored Procedures) and Cube processing. ● Wrote and debugged T-SQL queries, SSRS reports, SSIS packages, and stored procedures. ● Conducted training sessions for end-users on CRM best practices and data management and Managed Tableau Server, ensuring the availability and security of Tableau dashboards for end-users. ● Conducted ad-hoc analysis and presented findings to stakeholders and senior leadership. ● Implemented Lakehouse Architecture with Delta Tables for efficient data storage. ● Experience in Developing ETL solutions using Spark SQL in Azure Databricks for data extraction, transformation and aggregation from multiple file formats and data sources for analyzing & transforming the data to uncover insights into the customer usage patterns. ● Involved in loading JSON datasets into MongoDB and validating the data using Mongo shell. ● Loaded the aggregated data into MongoDB for reporting on the dashboard and worked on MongoDB schema/document modeling, querying, indexing, and tuning. ● Performed data cleaning and profiling to ensure data correctness and consistency. ● Designed and implemented scalable data pipelines using Spark, Kafka, and Flink for processing large volumes of streaming data, while also incorporating advanced Information Security practices for vulnerability identification and remediation. ● Managed CRM data for key accounts, providing actionable insights to sales and Promotion teams. ● Created and maintained automated reports and dashboards to monitor KPIs and track business metrics. ● Implemented various customized Oracle reports using different techniques in Oracle SQL/PLSQL like SQL*Plus reports. ● Worked on Azure PaaS Components like Azure data factory, Data Bricks, Azure logic apps, Application insights, Azure Data Lake, Azure data lake analytics, virtual machines, Zeo-Replication, and app services. ● Managed large datasets using Panda data frames and MySQL and Migrate data from traditional database systems to Azure databases. ● Build, deploy and monitor Batch and near real time data pipelines to load structured and unstructured data into Azure Data Lake Storage. ● Designed and created backend data access modules using PL/SQL stored procedures and Oracle. Wrote and executed various MYSQL database queries from python using Python-MySQL connector and MySQL dB package.

Образование

Computer Science (Магистр)
2021 - 2022
University of Missouri Kansas City
Computer Science (Бакалавр)
2012 - 2016
Vellore Institute of Technology

Языки

ТамильскийСреднийАнглийскийПродвинутый