














Built a dual encoder vision language model using CLIP style contrastive learning to match product images and textual metadata for large scale e commerce retrieval.
Pretrained on 120K+ image text pairs with ViT B/32 as vision encoder and DistilBERT as text encoder, improving Recall@10 by +9.6% over baseline ResNet + BoW.
Implemented in-batch hard negative mining and momentum queue sampling to enhance discrimination between visually similar products (e.g., sneakers vs. running shoes).
Applied data deduplication and caption cleaning pipelines with Spark NLP and OpenCLIP tokenizer, improving text image alignment accuracy by 6.3%.
Fine-tuned on curated human labeled product triplets and category split validation sets, achieving +5.3% NDCG@10.
Deployed the retrieval model via FAISS and TorchServe, serving 500K+ embeddings at sub 25 ms latency.
Integrated new retrieval pipeline into personalized recommendation stack, resulting in a +3.1% CTR uplift in A/B testing.
Monitored training using Weights & Biases, tracking loss curves, recall metrics, and embedding drift for retraining triggers.
Developed a reinforcement learning framework for prompt tuning of task-specific LLMs (e.g., summarization, chain-of-thought QA), enabling response quality optimization without full finetuning.
Used OpenAI GPT-3.5 and Mistral-7B as base LLMs, combined with RLHF-style reward modeling pipeline to score outputs on relevance and factuality.
Built reward model using BERTScore, GPT-4 judgment, and task-specific heuristics (e.g., correct entity match for QA).
Applied PPO (Proximal Policy Optimization) with low-rank adapter (LoRA) layers to update prompts and control tokens.
Achieved +11.2% factual consistency and -7.4% hallucination rate on internal summarization benchmark compared to vanilla prompt.
Deployed the optimized prompts into prompt-template registry used by internal RAG systems, improving downstream response pass@1 by 9.6%.
Monitored prompt performance drift across live production logs using Langfuse and custom OpenTelemetry pipeline.
Coordinated with applied scientists to roll out prompt variants in staged A/B buckets, with automated fallback to baseline under degradation.
Developed a sensor fusion model combining LiDAR point cloud, camera images, and radar signals to improve 3D object detection accuracy in urban driving scenarios.
Applied PointPillars + ResNet-101 architecture with multi-scale attention fusion, improving mean Average Precision (mAP) by 14.6% on the nuScenes benchmark.
Built efficient data preprocessing pipeline using ROS2 + Open3D to align and voxelize sensor inputs, reducing preprocessing latency by 38%.
Integrated learned embeddings into a downstream trajectory prediction module, achieving smoother path planning with 23% fewer collision violations in simulation.
Implemented real-time model serving using TensorRT + Nvidia Triton Inference Server, enabling sub-50ms inference latency on Jetson Xavier.
Collaborated with simulation engineers to run 10,000+ scenario tests using CARLA, validating model robustness under fog, night, and occlusion conditions.
Supported deployment on a test fleet, contributing to a 22% reduction in disengagements per 1,000 miles on urban routes.
Built a feature-rich real-time fraud detection model using LightGBM and XGBoost, optimized for low-latency scoring on live payment streams.
Developed high-throughput feature engineering pipeline on Apache Flink, extracting device, geo, merchant, and behavioral signals with 3-second SLA.
Introduced feature freshness validation layer to prevent leakage in model inputs; reduced false positives from data staleness by 22%.
Trained on 250M historical labeled transactions with class imbalance handling using cost-sensitive learning and SMOTE variants.
Achieved +4.7% uplift in ROC-AUC and ~15% reduction in fraud loss rate compared to existing rule-based + ensemble baseline.
Deployed model using ONNX + FastAPI and served via Kafka stream integrated with company’s payment gateway.
Built shadow-mode monitor to compare predictions of current vs. new model in production, ensuring safe cutover.
Integrated feedback loop from chargeback events to enable retraining every 2 days via Airflow-managed pipeline.
Coordinated with security engineering team to conduct adversarial evaluation using simulated fraud replay scenarios.
Developed a dual-stage search stack combining bi-encoder retrieval (DPR) with cross-encoder reranking (BERT), powering semantic search across 200M+ documents.
Used HuggingFace Transformers for fine-tuning on in-domain QA pairs and click data; achieved +12.4% MRR@10 improvement over BM25 baseline.
Engineered offline training pipelines with Petastorm + PySpark, handling 5B+ token corpus from user logs and documentation.
Deployed FAISS-based vector store sharded across 8 servers with dynamic reloading and per-language embedding partitioning.
Integrated click feedback and dwell-time weighted relevance signals into reranker training set with online A/B test feedback.
Served retrieval layer via Triton Inference Server and reranking via ONNX Runtime, maintaining <150ms P95 latency for combined stack.
Built diagnostic tool to inspect false positives/negatives by comparing attention maps across reranker layers.
A/B testing against production stack showed +9.7% uplift in user query satisfaction score and +6.2% CTR improvement on search results page.
