Sharif Mulani

Senior

Регистрация: 26.02.2024

Специализация: Technical Architect (Gen-AI/ML)

— Overall, 19+ Years of experience in designing and developing B2B and B2C software applications in legacy, distributed computing and most recent Generative AI (LLM / FM), Machine / Deep Learning technology / platform. — 9 Years of experience in AI / ML Tech Stack (viz. Generative AI (LLM, FM), Data science / Machine Learning / Deep learning) projects and having exposure to RASA-NLU Bot, Azure ML and C3.AI platform. — 2 Years of experience in Big-data analytics (Cloudera, Hortonworks, Databricks), Additional experience / knowledge in RDBMS (Oracle, Db2) OLTP/OLAP System ((SAP-HANA) and NoSQL Database (MongoDB), Graph Database (Neo4j, GDSL-Graph Data Science Library). — Additional experience/knowledge in RDBMS (Oracle, Db2), OLTP/OLAP System (SAP HANA), No-SQL Database (MongoDB), Graph Database (Neo4j, GDSL-Graph Data Science Library). — July 2023: SPOT award for best performance from NECI. — Sep 2013: STAR award for best performance from Xoriant. — July 2010: KUDOS for job well done from SunGard.

Machine Learning Java Deep Learning Spark Linux PyTorch SAP Python Oracle nosql

Портфолио

Aligned Automation

2. DELL - Address Validation & Geocoding (Python, NLP / DL). This project basically deals with rectifying bad or incorrect GEO address from LATAM countries like MEXICO, ARGENTINA, CHILLI, etc. with the help of Google Map, OpenStreetMap API and NLP technique of sentence embedding using SBERT (Sentence BERT). ● Solution architecture/design and Implementation in Azure Blob Storage, Azure ML & Azure DevOps CI-CD framework. ● NLP (SimCSE: Simple Contrastive Learning of Sentence Embeddings) model were trained on Tesla v100 GPU based system for domain specific corpus (i.e., SPANISH public domain addresses) for Textual Semantic Search downstream task and used to stored pre-validated Mexico address embeddings and search them in Elasticsearch-8.0 based dense vector database using cosine similarity. ● Solutions was made RESTful through Flask API framework and hooked to NodeJS / Reacts UI for better user experience. ● Project Management, Handling team of data scientist/machine learning engineer, data engineer, test engineer.

SunGard Global Solutions

1. NEC Japan - GTC Framework Development (Python, PyTorch DL, NLP). This project basically deals with development of Generic Text Classifier (GTC) framework that offers (a) Training (i.e., just use architecture) and Additional training (i.e., use architecture and weights) of various pre-trained embedding models on custom dataset (b) Various validation method (i.e., hold-out, k-fold), (c) Build classification model on top of embedding model from (a) with custom network layer defined through configuration. ● Ask was to build GTC framework that can support various customization (i.e., customized data, custom embedding and classification model) using pyTorch framework. ● Lead the framework design and development (includes defining the framework architecture, AWS EC2 (GPU) infra selection, GTC framework User Manual Creation). ● Various Transformer based Embedding Model (viz. BERT, BART, LaBSE, GPT2) & pre-defined Classification Model (viz. AutoModelSequenceClassification) were trained and tested on custom dataset. ● Received SPOT award for development of framework in short span of time. 2. Generative AI based POC and Proposal (Python, Generative AI/LLM, LangChain, Azure OpenAI). ● Contributed to Project Proposal based on POC for Q&A Bot to address banking customer query using Azure OpenAI API service. ● Inhouse POC - Built Q&A Bot using Lang Chain framework to address HR policy related queries.

Aligned Automation

Скиллы

Python

Pytorch

TensorFlow

Kearas

Nltk

SciKit-Learn

SciPy

Seaborn

Matplotlib

Pandas

NumPy

R (3.3.1)

R-Studio

Lime/Shap Library

Azure ML

C3.AI Platform

Azure Data Factory

Databricks

DevOps

Hadoop 2.0 Cloudera CHD-5.4

HDP-2.3

Hive

HBase

Zookeeper

PySpark

Apache Zeppelin

Flume

Hue

SAP HAN Vora

GPT-3.5/4

Openllama

Nous-HermasLlama2-13b

Alpaca

Vicuna

Lang Chain

Azure OpenAI API

RAG

PEFT (LoRA)

BERT/SBERT

LaBSE

BART

SimCSE

GPT-2/3

AI/ML Ops (Mlflow, Azure-MlOp)

Chatbot w/ RASA NLU

Lasso Regression

Random Forest

XGboost

Naive Bayes

K-Mean

Isolation Forest

ANN

RNN

LSTM

Neo4j

ADLS-Gen2

SAP S/4 HANA

Teradata

MongoDB

DB2v10

Oracle10g

R-Studio (0.99)

HANA-Studio (2.3.8)

PyCharm (2018.3.5)

Sublime Text (3.1.1)

Spyder

Jupiter Notebook

Eclipse (Mars 2.0)

JIRA

Issue Tracking

Task Management

Scrum Dashboard

Docker

GitHub

Jenkins

Maven Build Management

Sphinx

Roxygen

Unix

Linux (Debian, Ubuntu, CentOS, RedHat, SuSe)

Windows

Опыт работы

AI/ML Technical Architect

06.2023 - 10.2023 |NEC India Corporation

Python, PyTorch DL, NLP, Generative AI/LLM, LangChain, Azure OpenAI

Manager/ Principal Data Scientist

10.2021 - 02.2023 |Aligned Automation

Python, PyTorch, Generative AI / LLM, Spark ML, EDA / Graph Visualization

1. SAP SD – Information Extraction (Python, PyTorch, Generative AI / LLM). This project basically deals with information extraction using Generative AI approach (i.e., one-shot, few-shot, Tree of Thought, Static and Dynamic Prompt engineering, Alpaca style Prompt Engineering, Auto Prompt Tuning etc.) from SAP Business Requirement (supplied in form of Q&A dataset for SAP SD Module). ● Ask was to extract keywords (single or multiple) from SAP Business Requirement supplied inform of Question & Answer. ● Existing SAP implement AI tool used to map Q&A to specific SAP SD BDC screen’s Process Element and its field name. However, Field value is supposed to be Extracted from Question & Answer through Prompt Engineering Technique. ● Various LLM (viz. Falcon, Vicuna, Llama based i.e., openllama, Nous-Hermas-Llama) were used for Prompt Engineering Technique (viz. one-shot, few-shot, Tree of Thoughts engineering, auto-prompt tuning). ● Contributed in research and leading the team of NLP Prompt engineer. 2. DELL - Address Validation & Geocoding (Python, NLP / DL). This project basically deals with rectifying bad or incorrect GEO address from LATAM countries like MEXICO, ARGENTINA, CHILLI, etc. with the help of Google Map, OpenStreetMap API and NLP technique of sentence embedding using SBERT (Sentence BERT). ● Solution architecture/design and Implementation in Azure Blob Storage, Azure ML & Azure DevOps CI-CD framework. ● NLP (SimCSE: Simple Contrastive Learning of Sentence Embeddings) model were trained on Tesla v100 GPU based system for domain specific corpus (i.e., SPANISH public domain addresses) for Textual Semantic Search downstream task and used to stored pre-validated Mexico address embeddings and search them in Elasticsearch-8.0 based dense vector database using cosine similarity. ● Solutions was made RESTful through Flask API framework and hooked to NodeJS / Reacts UI for better user experience. ● Project Management, Handling team of data scientist/machine learning engineer, data engineer, test engineer. 3. DELL - Asset Overheating Classification (Python, Spark ML, EDA / Graph Visualization). This project basically deals with large scale telemetry data processing (Azure Databricks platform w / Spark ML Lib) and establishing the MOI parameters that contributes to Asset classification (Overheating v/s Non-Overheating) through EDA & ML. Provide insightful information to Tech Support Persona to avoid unrealistic / unjustified dispatch. ● Lead the project on establishing the threshold values for MOI parameters and their persistence duration through EDA. ● Calculated the weightage of parameter found in EDA through statistical approach in iterative way and cross checking them w/ asset that dispatch for overheating reason. ● Highlight the parameter as contributing factor based on correlation matrix. ● Based on MOI parameter’s threshold, persistence & weightages value – classify asset into Overheating v/s Non-Overheating through Imperial methods. ● Continue to build ML model on ASSET classification and measure the model performance. ● Data and benchmark threshold ingested in Neo4j graph for drill down analysis through Neo4j-Bloom. ● Project Management, Handling team of Data scientist, Graph Database Engineer.

Lead Machine Learning Engineer

05.2021 - 09.2021 |Savart

Python, Deep Learning, NLP

1. AI-Enabled Stock Advisory/Recommendation (Python, Deep Learning, NLP). This project basically deals with building an advisory app for security traded in different stock exchanges. ● Qualitative Analysis is being Implemented as NLP project on company’s public data like annual reports, call transcript pdf (text, image, table), news articles and implemented BERT based downstream Q&A task for final scoring model. ● Building scoring model on top of BERT Q&A downstream task. ● Built chatbot using RASA-NLU framework as interface to backend Q&A model.

Sr. Software Engineer ML/DS/Bigdata

10.2012 - 11.2020 |Xoriant Solutions Pvt. Ltd.

Python, Deep Learning, NLP, R, Machine Learning, Hadoop Big Data, PySpark, Java, Hive

1. SAP Ariba - Text Commodity Classifier (Python, Deep Learning, NLP). This project deals with development of Commodity-Text Classifier that classify newly created commodity to respective category. ● Implemented basic Multinomial Naïve Bayes model at initial stage with tf-idf vectorization and used chi-square test to choose most relevant word features for this multiclass text classification. ● At later stage, bi-directional LSTM model was built with fitting word-2-vec on training corpus using genism and created word embedding matrix which further used as weight in Keras Embedding layer. ● At final stage transfer learning model i.e., BERT was implemented for this multiclass text classification. 2. SAP Ariba - Risk Quantification and Predictions (R, Machine Learning). This project deals with implementing Data Science in R-3.3.1 and modeling techniques like linear regression, lasso regression and used statical methods like interpolation etc. for calculating and listing potential and categorical risk score/exposure with various contributing factors on Supplier 360 page of Supplier Risk Management Application that offers valuable risk insights for each supplier in procurement process. It considers various News feed (e.g., bankruptcy, lawsuit, disaster). ● I involved in building and implementing RISK EXPOSURE model in R-3.3.1 through different DS phases (like Understanding Business Problem, Data Collection/Cleaning EDA/Data Modeling, Validation, Visualization, Deployment & Optimization). ● Created Reporting API through R endpoint and making it RESTful (API) through open CPU Instance so that it can be consume by JAVA based application and can be tested by POSTMAN app. ● I involved in creating schemas/store procedure/triggers in HANA (SQL) and creating. 3. SAP HANA Vora Evaluation (Hadoop Big Data, PySpark). This project deals with evaluation of SAP HANA Vora big data product that involves Setting up SAP HANA Vora (dev) edition built on Hadoop Hotornworks distribution on AWS instance, Ingesting HVAC sensors data (streamed by apache flume) and facts table data (loaded via Sqoop) in Hadoop data lake and evaluated SAP HANA Vora capabilities of building data hierarchy on HDFS raw data, interactive and drill down OLAP style analytics through PySpark code executed and end results visualized in Apache Zeppelin webbased notebook. Further Vora virtual table in Hadoop can be use as data source in SAP BI tool like Lumira. ● Implemented I involved in e-learning and setting up AWS instance and installing SAP HANA Vora (dev) edition built on Hotornwork HDP-2.3 troubleshooting and monitoring SAP HANA Vora (dev) edition instance with Apache Ambari. ● Understanding the architecture and working of SAP HANA Vora on Hadoop and its integration with SAP HANA in-memory computing framework. ● I involved in ingesting sample data in Hadoop Lake and evaluating various features of SAP HANA Vora (viz. building data hierarchy & running interactive analytics in Apache Zeppelin) using sparkSQL. 4. Data Analysis Migration from QlikView to Hadoop (Java, Python, Big Data, Hive). This project deals with migrating sale related reports written in QlikView's SQL type language to Hive and Impala SQL language. Client has several product categories each with different service and contract models in Mobile domain. The client’s sales team has to rely on critical data analysis to dig into its massive customer base and millions of contracts signed with them to find lucrative revenue sources in form of potential service agreements. This data also helps them with revenue projection for current and upcoming financial years. ● Direct interaction with the client for requirement gathering and analysis involved in design. ● Importing and exporting data into HDFS and Hive using Sqoop. ● Convert Elkview script into optimized Hive-based or MPP-SQL based script. ● Apply Various Performance Tuning techniques (viz. File Formats-Parquet, ORC, Data Modelling -Partitioning, Bucketing), Create UDFs wherever required.

Software Engineer

05.2006 - 06.2012 |SunGard Global Solutions

Asset Management Legacy System, C++ / Java / Unix, Oracle 10g, ProC

Global Plus. SunGard's asset management, custody, and accounting platform used by financial services institutions across the world to manage their trust, private banking. ● Analysis, Designing, Development and Testing, and Bug fixing, Mentoring Teammates on product feature and related functionality.

Software Engineer

06.2004 - 04.2006 |OPUS Software Solution Pvt. Ltd.

UNIX, C

NCR – Application Support. ● Porting NCR-IPCS6000 cheque reader/ sorter product built on NCR-MP-RAS (UNIX) box in C language to Sun Sparc Solaris 5.8 (UNIX) box. Successful UAT completion at NCR (Mumbai) & NCR (Japan) end client MITSUBISHI Bank.

Образование

Math & Computer Application (Бакалавр)

По 2001

Mumbai University

Specialist

По 1996

MH State - SCC

Engineering

По 1998

MH State - HSC (Mumbai Divisional Board)

Post Graduation (MCA) degree with CPI 3/6 in Computer Science from PUCSD (Pune University Computer Science Department), Pune.

По 2004

Pune University Computer Science Department

Языки

АнглийскийСвободно владею