AI/ML Engineer · DevOps · Observability

RANGA BASHYAM G

Building intelligent systems at scale

3.5+ years engineering end-to-end LLM & ML systems. From RAG architectures to GPU inference optimization — I bridge the gap between AI research and production-ready infrastructure.

Ranga Bashyam
Experience
3.5+
Years in AI/ML
Availability Target
99.99%
SLO Maintained
Students Trained
450+
AI/ML Sessions
Cloud Platforms
3
IBM · Azure · GCP
LANGCHAIN ✓ ACTIVE TRITON_INFERENCE GPU:A100 SLO_BUDGET 99.99% MILVUS_VECTORS +2.3M PROMETHEUS_METRICS INGESTING ONNX_MODEL OPTIMIZED KUBERNETES_PODS 18/18 RUNNING RAG_PIPELINE P99: 210ms ANOMALY_DETECTION 0 ALERTS IBM_COS_LAKE STREAMING TENSORRT 4.2x SPEEDUP SERVICENOW_TRIAGE AUTO LANGCHAIN ✓ ACTIVE TRITON_INFERENCE GPU:A100 SLO_BUDGET 99.99% MILVUS_VECTORS +2.3M PROMETHEUS_METRICS INGESTING ONNX_MODEL OPTIMIZED KUBERNETES_PODS 18/18 RUNNING RAG_PIPELINE P99: 210ms ANOMALY_DETECTION 0 ALERTS IBM_COS_LAKE STREAMING TENSORRT 4.2x SPEEDUP SERVICENOW_TRIAGE AUTO
01

Core Stack

🧠
Intelligence Layer

AI / ML

LLMsRAGNLP STT/TTSAnomaly Detection Predictive AnalyticsFine-tuning
AI Infrastructure

AI Stack

LangChainLangGraph MilvusChroma Triton ServerONNXTensorRT
📡
Reliability Engineering

DevOps & Observability

PrometheusSysdig GrafanaOsprey SLO/SLI/SLADockerKubernetes
☁️
Cloud & Platforms

Infrastructure

IBM CloudAzureGCP AWS S3NginxGitHub Actions
🗄️
Data Engineering

Data & Pipelines

DuckDBIBM COS Airflow DAGsRedis SQLTime-series
💻
Languages & Integrations

Programming

PythonBashSQL FlaskSalesforceJiraServiceNow
02

Experience

Infobell IT Solutions Pvt Ltd
AI/ML Engineer — Reliability & Forecasting
Jan 2024 – Present
● Current
Built AI-driven anomaly detection using PCA, t-tests, chi-square and multivariate analysis for proactive health monitoring.
Designed LLM-powered Incident Summarizer + Change Analyzer, automating ServiceNow triage workflows.
Integrated SLO/SLI/SLA error budgets with AI analytics, supporting 99.99% system availability.
Engineered time-series ingestion pipelines into IBM COS Data Lake for scalable analytics.
Developed RAG copilots, multimodal avatar systems, and speech-impairment solutions with GPU optimization (ONNX + TensorRT).
Deployed scalable ML APIs on Azure, IBM Cloud, and Kubernetes with cross-team collaboration.
Freelance AI Consultant
Independent
Jun 2022 – Dec 2023
Delivered intelligent chatbots and AI-powered web solutions for clients across domains.
Transitioned from backend/web development to full-stack AI project delivery.
Conducted AI/ML training sessions reaching 400+ students and 50+ faculty members.
03

Projects

P-001 / OBSERVABILITY

Analytics & Metrics Migration
Sysdig → Osprey

AI-driven anomaly detection engine for time-series system metrics using PCA, statistical analysis, and hypothesis testing. LLM-powered incident summarizer that automates ServiceNow workflows for real-time triage.

PCAPrometheus SysdigOsprey IBM COSLLMsSLO/SLI
P-002 / INDIC AI

Shastra Anveshi — Indic RAG for Ancient Scriptures

RAG system centered on Indian ancient texts (Bhagavad Gita) interpreted through Shankaracharya's Advaita Vedanta. Indian-only model stack — MuRIL embeddings + Sarvam-M LLM. Fine-tuned retrieval on scripture structure for philosophical fidelity.

RAGMuRIL Sarvam-MMilvus LangChainFine-tuning
P-003 / INFRA AI

AMD EPYC Advisory — AI/RAG Infra Recommendation

MCP-driven AI/RAG recommendation system that maps user workloads to optimal AMD EPYC processor choices. End-to-end AI advisory flow combining LLM reasoning with production-ready deployment for cost-efficient resource utilization.

RAGMCP LLM ReasoningAMD EPYC DockerProduction API
P-004 / SPEECH AI

Speech Impairment (Dysarthria) Assistant

Complete audio pipeline — upload → transcription → live transcription → TTS feedback loop designed specifically for users with dysarthria. Cloud-native deployment with real-time processing.

STT/TTSPython FlaskAWS S3 DockerNginx
P-005 / ENTERPRISE AI

CONVOGENE Co-Pilot

Enterprise co-pilot integrated with Salesforce and Jira. Enhanced LangChain/LangGraph agentic workflows with serverless deployments via GitHub Actions and Azure, enabling automated business process orchestration.

LangGraphSalesforce JiraAzure GitHub ActionsServerless
P-006 / GPU BENCHMARKING

NVIDIA GPU Benchmarking — Llama on A100/L40S

Systematic benchmarking of Llama models on NVIDIA A100 and L40S GPU clusters using TensorRT and Triton Inference Server. Achieved significant throughput improvements and latency reduction at production scale.

TensorRTTriton LlamaA100 L40SONNX
04

Observability Dashboard

SLO Availability
99.99%
Error Budget Remaining: 87.3%
Model Performance
RAG P99
210ms
TensorRT Speedup
4.2×
ONNX Opt.
91%
GPU Util (A100)
88%
Pipeline Throughput — Last 12 Intervals
Cloud Stack Usage
IBM Cloud
85%
Azure
70%
GCP
45%
Kubernetes
92%
Incident Log
14:32:01 Anomaly detected in API latency — PCA isolation triggered auto-remediation RESOLVED
11:17:44 GPU memory utilization approaching threshold on L40S cluster — scaling event MONITORED
08:05:19 ServiceNow triage auto-completed by LLM incident summarizer — 0 manual actions AUTO-RESOLVED
02:41:00 IBM COS Data Lake ingestion pipeline restored after transient network event RESOLVED
05

Education & Certs

2024 – 2026 · Ongoing

M.Sc. Data Science

Kalasalingam University

2022 – 2023

PG Diploma in Data Science & Engineering

Great Lakes Institute of Management

2019 – 2022

B.Com (Computer Applications)

Vivekananda College

Certifications

Industry Credentials

Career Essentials in Generative AI — Microsoft/LinkedIn

Introduction to AI/ML Toolkits with Kubeflow (LFS147)

Let's Build Something
Extraordinary

Open to AI/ML engineering roles, consulting engagements, and collaborative research. I specialize in turning complex ML research into production-grade systems.