I’m an AI/ML Engineer with an M.S. in Computer Science from the University of Oklahoma, specializing in fine-tuning large language models (LLMs), Retrieval-Augmented Generation (RAG), and scalable ML systems. I build production-grade GenAI applications by combining LLM optimization (LoRA, QLoRA, 4-bit inference) with strong engineering foundations in PyTorch, Hugging Face, FastAPI, Docker, and AWS.
My work spans multi-document RAG assistants, multimodal reasoning systems, and high-performance backends for data-heavy applications—always with a focus on measurable impact: higher relevance, lower latency, and more reliable ML in production.
A multi-document RAG assistant using FAISS + BM25 hybrid retrieval, conversational memory, and citation-grounded responses with 18% improved precision.
A LLaVA-powered multimodal system for visual–text reasoning, dual-image comparison, and analysis with 4-bit quantized inference for efficient local deployment.
Personalized book recommendations via semantic and emotional understanding using Sentence Transformers and an interactive Gradio interface.
A domain-specific AI chatbot for e-commerce with fine-tuned GPT/LLaMA models, product-aware RAG, and real-time query resolution.
A full-stack e-commerce platform with secure auth, dynamic cart and order workflows, and optimized Python + SQL backend with 25% faster DB performance.
Or, email me directly at pateldeep1842@gmail.com