Nikhil Sai
Портфолио
Wells Fargo
● Developed Spark / Scala, Python for regular expression (regex) project in the Hadoop / Hive environment with Linux/Windows for big data resources. ● Developed and designed JavaScript modules and REST APIs utilizing MarkLogic to support complex searches, enhancing enterprise platform integrations. ● Implemented ETL pipelines using Databricks to transform and process large datasets, ensuring efficient data integration and quality. ● Maintained database objects within a software version control system, ensuring version integrity and smooth deployments. ● Expertly migrated data from existing applications and database environments to new architecture, ensuring data integrity and consistency. ● Developed and maintained Databricks jobs to automate data processing workflows, enhancing data accuracy and timeliness. ● Conducted training sessions and provided mentorship on DataVault modeling and automation to junior engineers, fostering a culture of continuous learning and improvement. ● Architected and built Informatica Power Center mappings and workflows, participating in requirements definition, system architecture, and data architecture design. ● Utilized MarkLogic framework and DynamoDB for real-time data processing, combined with advanced analytics tools like SAS, R, Python, and other statistical software. ● Successfully implemented data management solutions using Snowflake for over 5 years, focusing on the complete lifecycle from architecture design to deployment. ● Proficient in handling different file formats such as JSON, XML, and CSV, successfully transforming and ingesting data across multiple platforms to support business intelligence and analytics initiatives. ● Communicated business goals and requirements to create data-driven solutions, enhancing business value.
Nationwide
● Designing the business requirement collection approach based on the project scope and SDLC methodology. ● Creating Pipelines in ADF using Linked Services / Datasets / Pipeline / to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. ● Experience with designing and implementing data pipelines and real-time data processing applications using Pub/Sub systems. ● Familiarity with standard Azure security tooling like Microsoft Defender Suite and Sentinel, coupled with expertise in scripting languages such as PowerShell, Python, and Bash for automation and security orchestration. ● Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts. ● Implemented CDC mechanisms with Apache Hudi to enable efficient upserts and data consistency in distributed data lake environments, ensuring real-time data availability for downstream analytics. ● Writing a Data Bricks code and ADF pipeline with fully parameterized for efficient code management. ● Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio. ● Involved in all the steps and scope of the project reference data approach to MDM, have created a Data Dictionary and Mapping from Sources to the Target in MDM Data Model. ● Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. ● Developed and Optimized Data Storage Solutions: Designed and implemented scalable data storage architectures using HBASE and Cassandra for a real-time analytics platform, improving data ingestion rates by 30% and reducing query response times by 40% through effective schema design and performance tuning. ● In-depth knowledge of SAS Enterprise Guide for data analysis and reporting.
Johnson & Johnson
● Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop. ● Implemented Data Stage jobs to process and analyze large volumes of mortgage data, enabling data-driven insights into loan performance, borrower behavior, and market trends. ● Visualized the results using dashboards and the Python Seaborn libraries were used for Data interpretation in deployment. Used Rest API to Access HBase data to perform analytics. ● Involved in creating Hive tables, loading with data, and writing Hive queries that will run internally in MapReduce way. ● Experienced with handling administration activations using Cloudera manager. ● Created and maintained technical documentation for launching Hadoop Clusters and for executing Pig Scripts. ● Storing and processing large amounts of data using services such as Google Cloud Storage, Bigtable, and BigQuery. These services can be used to store raw data, process it, and load it into a data warehouse for analysis. ● Build a program using Python and Apache beam to execute it in cloud Dataflow and to run Data validation jobs between raw source file and big query tables. ● Automatically scale-up the EMR instances based on the data using Atscale. ● Development of company´s internal CI system, providing a comprehensive API for CI/CD. ● Developed MapReduce programs to process the Avro files and to get the results by performing some calculations on data and performed map side joins. ● Imported Bulk Data into HBase Using MapReduce programs. ● Used built to store streaming data to HDFS and to implement Spark for faster processing of data. ● Worked on creating the RDD's, DFs for the required input data and performed the data transformations using Spark Python. ● Migrated complex map reduce programs into in memory Spark processing using Transformations and actions. ● Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.