Sheetal Reddy
Портфолио
Michelin
● Gathering business requirements from the Business Partners and Subject Matter Experts. ● Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes. ● Involved in implementing nine node CDH4 Hadoop cluster on Red hat LINUX. ● Leveraged Amazon Web Services like EC2, RDS, EBS, Elbaite scaling, AMI, IAM through AWS console and API Integration. ● Created scripts in Python which integrated with Amazon API to control instance operations. ● Extensively used SQL, NumPy, Pandas, Scikit-learn, Boost, TensorFlow, Koras, Porch, Spark, Hive for Data Extraction, Analysis and Model building. ● Experience in using various packages in R and Python like ggplot2, caret, duly, Wreak, models, Curl, tm, C50, twitter, NLP, Reshape2, Rison, seaborn, SciPy, matplotlib, Beautiful Soup, Rpy2. ● Expertise in database programming (SQL, PLSQL) XML, DB2, Informix, Teradata, Database tuning and Query optimization. ● Monitored, Scheduled, Automated bigdata workflows using Apache Airflow ● Experience in designing, developing, scheduling reports/dashboards using Tableau and Cognos. ● Expertise in performing data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata. ● Good knowledge in understanding and using NoSQL databases Apache Cassandra, Mongo DB, Dynamo DB, Couch DB and Redis. ● Experience on querying various Relational Database Management Systems including MySQL, Oracle, DB2 with SQL and PL/SQL. Built and maintained SQL scripts, indexes, and complex queries for data analysis and extraction. ● Performance tuning of SQL queries and stored procedures. ● Perform fundamental tasks related to the design, construction, monitoring, and maintenance of Microsoft SQL databases. ● Implemented Data Lake in Azure Blob Storage, Azure Data Lake, Azure Analytics, Data bricks Data load to Azure SQL Data warehouse using Poly base, Azure Data Factory.
Nevro Corporation
● Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation. ● Working with two different datasets one using HiveQL and the other using Pig Latin. ● Involved in moving the raw data between different systems using Apache Nifi. ● Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances. ● Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production. ● Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds ● Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs. ● Programmed in Hive, Spark SQL, Java, C# and Python to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines. ● Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing. ● Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop. ● Developed Hive queries to pre-process the data required for running the business process ● Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios. ● Configured and managed AWS EMR, Google Cloud Data Proc, and Azure HDInsight clusters. ● Enabled scalable and efficient big data processing using Hadoop and Spark. ● Integrated AWS EMR with S3 and Redshift. ● Established a GCP Data Lake using Google Cloud Storage, BigQuery, and BigTable, providing seamless and secure data storage and querying capabilities.
Verizon
● Worked extensively with AWS services like EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS. ● Integrated Terraform with data services and platforms such as AWS Glue, Amazon Redshift for provisioning and managing data infrastructure. ● Built serverless ETL pipelines using AWS Lambda functions to extract data from source systems, transform it according to business logic and load it into target data stores. ● Experienced in tuning model hyperparameters using Sage Maker built-in hyperparameter optimization (HPO) functionality. ● Capable of conducting automated experiments to find optimal hyperparameter configurations for improved model performance. ● Spearheaded end-to-end Snowflake data warehouse implementation, including designing and building the data architecture, setting up schemas, roles, and warehouses, and ensuring optimal data performance. ● Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data warehouse. ● Implement and support of data warehousing ETL using Talend. ● Familiar with ethical considerations and best practices in AI and ML, including fairness, transparency, accountability, and privacy. ● Experienced in implementing fairness-aware algorithms and bias mitigation techniques to ensure equitable and responsible AI systems. ● Knowledgeable about Stream Sets support for real-time data streaming, CDC (Change Data Capture), and data quality monitoring. ● Familiar with DataStage and metadata management capabilities, including defining and maintaining metadata repositories, data lineage, and impact analysis. Experienced in leveraging metadata to ensure data governance and compliance. ● Utilized Power BI to create various analytical dashboards that helps business users to get quick insight of the data. ● Developed Kafka producers and connectors to stream data from source systems into Kafka topics using Kafka Connect framework or custom implementations.