Kanaka Naga Akhil Potturi

Middle

Регистрация: 22.05.2024

Специализация: Data Engineer / Analyst

— I'm Kanaka Naga Akhil P, a Data Engineer with over six years of hands-on experience in designing, implementing, and optimizing data solutions. My journey in the data engineering field has equipped me with a deep expertise in Kafka event streaming, managing Azure cloud data ecosystems, and data lake management. — Throughout my career, I have developed a strong foundation in data extraction, transformation, and loading (ETL) processes, utilizing tools like Azure Databricks, SQL, PostgreSQL, SQL Server, and Oracle. I'm proficient in database programming with Oracle SQL/PLSQL, Informatica, and Unix shell scripting. Additionally, I've built and deployed microservices architecture applications using Spring Boot and Kubernetes with Docker containers. — My experience extends to leveraging big data tools and technologies such as HBase, Hive, MapReduce, Spark, and Kafka. I excel in data manipulation, statistical analysis, and data visualization, utilizing tools like Power BI and Tableau to turn complex data into actionable insights. I am also skilled in using programming languages like Python and R for data analysis and machine learning. — In my recent role at BCBS in Plano, TX, I designed and implemented scalable data pipelines, managed Hadoop deployments in the AWS cloud, and developed interactive dashboards and reports using Power BI. I am adept at developing ETL pipelines, managing large datasets, and deploying applications across various cloud platforms. — I hold a Master of Science in Computer Science from the University of Missouri-Kansas City and a Bachelor's degree from Vellore Institute of Technology in India. My educational background, combined with my professional experience, has given me a robust understanding of the software development life cycle and project implementation methodologies. Certification: — Completed certification in AWS Certified Data Analytics – Specialty. — Completed certification on Azure Databricks & Spark for Data Engineers (PySpark / SQL).

Python MySQL SQL Git PostgreSQL Agile MongoDB QA AWS Oracle Scala Kafka Azure CI/CD methodologies Spark

Скиллы

Python

Scala

Spark

Hadoop

AWS

SQL

RDBMS

DevOps Knowledge

GIT

HBase

HDFS

Hive

MapReduce

Pig

Yarn

Kafka

Zookeeper

Impala

Apache Sqoop

Apache Airflow

Hue

NiFi

Elasticsearch

Talend

SSIS

SSRS

Spark SQL

Data Frames

Data Sets

Kubernetes

Docker

Scikit-learn

Pandas

Matplotlib

MySQL

HTML/CSS

XML

JavaScript

JSON

UML Using MS Visio

Project Plan MS Project Plan

Power Builder

Tableau

Power BI

Amazon Web Services (AWS)

Azure

GCP

Data bricks

Visual Studio

Eclipse

Jenkins

GitHub

Snowflake

Опыт работы

Data Engineer / Analyst

с 01.2023 - По настоящий момент |BCBS

Spark, Kafka, Flink, SQL, Snowflake, Python, SnowSQL, Glue Studio, S3, Redshift, RDS, AWS, Amazon EMR, GitHub, Ansible, Hadoop, Elastic Map, Confidential RDS, Aurora DB, TKGS, Azure, GCP, Spring Boot, Docker, Kubernetes, AWS Code Pipeline, Microsoft Power BI, Power Pivot, Power View, CI/CD, Lambda, DynamoDB

● Designed and implemented scalable data pipelines using Spark, Kafka, and Flink for processing large volumes of streaming data. ● Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL. ● Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded it into S3, Redshift and RDS. ● Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift and S3. ● Management almost completely managed in GitHub or with Ansible. ● Managed Hadoop deployment in AWS cloud using s3 storage and Elastic Map Reduce. ● Configured an AWS Virtual Private Cloud (VPC), NACL, and Database Subnet Group for isolation of resources within the Confidential RDS and Aurora DB clusters. ● Demonstrated proficiency in deploying applications to various cloud platforms such as TKGS, Azure, GCP, and AWS, ensuring adaptability to different cloud environments and deployment strategies. ● Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake. ● Proficient in data analysis tools such as Excel, Python, R, and SQL for querying and manipulation. ● Built microservices using Spring Boot, deployed to Docker containers, and hosted in Kubernetes clusters. ● Created automated pipelines in AWS Code Pipeline to deploy Docker containers in ECS using S3. ● Designed and set up an Enterprise Data Lake to support various use cases including analytics, processing, storing, and reporting of rapidly changing data. ● Utilized Microsoft Power BI to design and develop interactive dashboards and reports, providing stakeholders with actionable insights into key performance indicators (KPIs) and business metrics. ● Leveraged Power BI and Power Pivot for data analysis prototypes, utilizing Power View and Power Map for effective report visualization. ● Implemented CI/CD pipelines with AWS for efficient deployment and optimized workspace and cluster configurations for performance. ● Designed and developed data management system using MySQL and Managed large datasets using Panda data frames and MySQL. ● Designed and developed security frameworks for fine-grained access control in AWS S3 using Lambda and DynamoDB. ● Utilized Well-planned streaming for real-time data processing and integrated Databricks with various ETL/Orchestration tools. ● Implemented machine learning algorithms in Python for predictive analytics and utilized AWS EMR for data transformation and movement. ● Prepared and delivered presentations to stakeholders summarizing A/B test methodologies, results, and actionable recommendations, demonstrating a strong understanding of the SDLC and project implementation methodology. ● Strong understanding of the Software Development Life Cycle and project implementation methodology.

Data Engineer / Data Analyst

05.2016 - 06.2021 |Spinix Solutions

ETL, Spark SQL, Azure Databricks, JSON, MongoDB, Spark, Kafka, Flink, CRM, Oracle SQL, PL/SQL, Azure PaaS, Zeo-Replication, Panda, MySQL, Azure Data Lake Storage

● Developed end-to-end Spark applications in Databricks and PySpark for data cleansing, transformations, and aggregations. ● Developed SPLUNK data models for data searches, reporting and dashboards. ● Experience in developing scalable & secure data pipelines for large datasets. ● Designed professional reports and dashboards in Excel and Power BI for management. ● Optimized Spark jobs for better performance and analyzed time-series data for trends. ● Managed and maintained Splunk environment with high volume ingestion from multiple custom devices and applications. The daily ingestion rate is 25Tb to 35Tb a day in logs in Splunk (compressed). This equaled about 50Tb to 60Tb in log rate ingestion before compression. ● Trouble shooted issues with ETL load (SSIS and Stored Procedures) and Cube processing. ● Wrote and debugged T-SQL queries, SSRS reports, SSIS packages, and stored procedures. ● Conducted training sessions for end-users on CRM best practices and data management and Managed Tableau Server, ensuring the availability and security of Tableau dashboards for end-users. ● Conducted ad-hoc analysis and presented findings to stakeholders and senior leadership. ● Implemented Lakehouse Architecture with Delta Tables for efficient data storage. ● Experience in Developing ETL solutions using Spark SQL in Azure Databricks for data extraction, transformation and aggregation from multiple file formats and data sources for analyzing & transforming the data to uncover insights into the customer usage patterns. ● Involved in loading JSON datasets into MongoDB and validating the data using Mongo shell. ● Loaded the aggregated data into MongoDB for reporting on the dashboard and worked on MongoDB schema/document modeling, querying, indexing, and tuning. ● Performed data cleaning and profiling to ensure data correctness and consistency. ● Designed and implemented scalable data pipelines using Spark, Kafka, and Flink for processing large volumes of streaming data, while also incorporating advanced Information Security practices for vulnerability identification and remediation. ● Managed CRM data for key accounts, providing actionable insights to sales and Promotion teams. ● Created and maintained automated reports and dashboards to monitor KPIs and track business metrics. ● Implemented various customized Oracle reports using different techniques in Oracle SQL/PLSQL like SQL*Plus reports. ● Worked on Azure PaaS Components like Azure data factory, Data Bricks, Azure logic apps, Application insights, Azure Data Lake, Azure data lake analytics, virtual machines, Zeo-Replication, and app services. ● Managed large datasets using Panda data frames and MySQL and Migrate data from traditional database systems to Azure databases. ● Build, deploy and monitor Batch and near real time data pipelines to load structured and unstructured data into Azure Data Lake Storage. ● Designed and created backend data access modules using PL/SQL stored procedures and Oracle. Wrote and executed various MYSQL database queries from python using Python-MySQL connector and MySQL dB package.

Образование

Computer Science (Магистр)

2021 - 2022

University of Missouri Kansas City

Computer Science (Бакалавр)

2012 - 2016

Vellore Institute of Technology

Языки

ТамильскийСреднийАнглийскийПродвинутый