Personal Portfolio

Hi, I'm Santosh Yadav😊

I design and build reliable, scalable data and machine learning infrastructure.

With a strong foundation in data engineering and machine learning operations (MLOps), I enjoy solving the "last mile" problems of getting models and pipelines into production. Whether it's building real-time data pipelines with Kafka, orchestrating workflows with Airflow, or deploying models using Docker, Kubernetes, and MLflow—I'm all about creating systems that are robust, automated, and easy to maintain.

🏗️ Designing data pipelines that scale
🚀 Operationalizing ML models in production environments
📊 Monitoring, observability, and continuous delivery in ML workflows
🤝 Bridging the gap between data science and production engineering

I believe in writing clean code, documenting processes, and collaborating across teams to bring data-driven products to life.

Current Role & Focus

I currently work at Scania (Traton Group), helping drive innovation in Autonomous Transport Solutions by building robust data infrastructure for large-scale machine learning systems.
My focus lies in developing reliable, scalable, and automated data pipelines to power intelligent mobility systems. From ingesting and transforming terabytes of sensor and telemetry data, to orchestrating machine learning workflows in production—I build the backend that keeps autonomous systems smart and responsive.

🚛 Current Focus

Designing and managing distributed data pipelines using PySpark on Databricks
Leveraging AWS cloud infrastructure for compute, storage, and automation
Building infrastructure-as-code with Terraform, orchestrated through GitLab CI/CD

💡 What I care about

Data quality, reproducibility, and observability
End-to-end automation and deployment of ML workflows
Collaboration between data, ML, and platform teams

Outside of work, I'm constantly learning about MLOps, streaming architectures, and scaling data systems for real-time decision making.

2014–20192019–20212021–20222022–Present

DeveloperResearcherData EngineerData Engineer

Capital Eye SolutionsUniversity of SkövdeKambi GroupScania Group

Tech Stack

Programming Languages

PythonExpert

SQLExpert

ScalaAdvanced

JavaIntermediate

Data Engineering

Apache SparkExpert

DatabricksExpert

Apache KafkaAdvanced

dbtIntermediate

Cloud & Infrastructure

TerraformExpert

AWSExpert

CI/CDAdvanced

DatadogAdvanced

Machine Learning

PyTorchExpert

Hugging FaceAdvanced

LLMsAdvanced

MCPIntermediate

Projects

Agentic AI with MCP

Model Context Protocol (MCP) addresses this by creating a standard way for AI models to interact seamlessly with different data sources and applications.

View Project GitHub

Data Pipeline that scales

A method to accept raw data from various sources, processes this data to convert it into meaningful information, and then push it into storage like a data lake or data warehouse.

View Project GitHub