PRASAD VICHARE

📊
Extract
⚙️
Transform
💾
Load
📈
Analyze

👋 Hi there! I’m a Data Engineer & Analytics Professional with 6+ years of experience turning raw data into business impact. I love building scalable pipelines, cloud warehouses, ML Integrated solutions, and interactive dashboards that make data work harder (so humans don’t have to 😅). Specialized in Azure, Microsoft Fabric, Databricks, Snowflake, Python, PySpark, SQL, and BI tools.

💡 I’m passionate about solving problems with data, optimizing processes, and building solutions that scale - from ETL pipelines to BI dashboards.

6+
Years of Professional Work Experience
20M+
Daily telecom events processed through scalable batch & streaming pipelines
35% ↑
Improvement in financial data accuracy through automated ADF & Fabric pipelines
70% ↓
Reduction in deployment errors via CI/CD pipelines using Azure DevOps & GitHub Actions

Core Technologies

The platforms and tools I build with every day

Microsoft Fabric
Icon-databases-126
Azure Data Factory
Databricks
❄️
Snowflake
Apache Airflow
Spark
Python
SQL
AI & ML
Tableau
Power BI
AWS

About Me

MS Management Information Systems
University of Illinois Chicago
Data Engineer
University of Illinois Chicago
Chicago, IL
Currently Based

Work Experience

Delivering data-driven solutions across multiple domains

Data Engineer | Graduate Assistant

University of Illinois Chicago
December 2024 – Present | Chicago, IL
  • Improved financial data accuracy by 35% by designing automated Azure Data Factory pipelines and Microsoft Fabric dataflows that consolidated complex ERP datasets into a centralized Azure Lakehouse, eliminating manual reconciliation workflows.
  • Reduced ad-hoc reporting requests by ~40% by building medallion-layer (Bronze → Silver → Gold) datasets using PySpark and SQL in Databricks with enforced schema validation and automated data quality checks.
  • Enabled real-time budget monitoring for 150+ university stakeholders by engineering Delta Lake datasets powering 20+ Tableau dashboards used for variance analysis and financial planning decisions.
  • Decreased pipeline runtimes and ensured SLA-compliant data freshness by implementing Change Data Capture (CDC) and incremental load strategies across recurring finance ingestion workflows.

Data Engineer | Deputy Manager

Reliance Jio Infocom Pvt. Ltd
September 2020 - July 2024 | Mumbai, India
  • Processed 15M–20M daily telecom events by developing scalable batch and streaming pipelines using Apache Spark and Azure Event Hub, enabling near-real-time analytics and spam detection capabilities.
  • Improved ML dataset reliability by transforming raw telecom data into curated Lakehouse datasets through ELT workflows built with Microsoft Fabric, PySpark, and SQL while enforcing schema evolution and data contracts.
  • Enabled parallel batch and streaming architectures with exactly-once processing guarantees by implementing near-real-time ingestion using Kafka-compatible APIs on Azure Event Hub.
  • Accelerated query performance on multi-terabyte datasets through optimized schema design, partition pruning strategies, and Z-ordering on Delta Lake tables, improving analytics responsiveness for network monitoring teams.
  • Reduced deployment errors by 80% and standardized release workflows by designing CI/CD pipelines using Azure DevOps and GitHub Actions for ETL scripts and data models.

Data Analyst

Quess Corporation
January 2019 – September 2020 | Mumbai, India
  • Streamlined analytics for daily operations covering 50K+ records by developing automated ETL pipelines with SQL to move operational data from AWS S3/Redshift into Snowflake, ensuring timely data availability
  • Architected a Snowflake warehouse using Medallion architecture (Bronze–Silver–Gold layers), creating optimized views and stored procedures at the gold layer to support critical reporting needs
  • Improved leadership decision-making by delivering Power BI dashboards powered by automated data pipelines, providing interactive KPIs and drill-down insights into operational performance.

Academic Projects

Hands-on implementation of data engineering best practices

Real-Time Fraud Detection with Drift Monitoring (Azure | Streaming | ML)

Nov 2025
  • Designed and implemented an end-to-end real-time fraud detection system supporting batch and streaming workflows using Azure Data Factory, Event Hubs (Kafka), Databricks Structured Streaming, and Delta Lake
  • Built a Bronze–Silver–Gold lakehouse architecture on ADLS Gen2, enabling scalable data ingestion
  • Developed a machine learning pipeline with real-time feature engineering and low-latency inference to score transactions and generate fraud alerts in near real time
  • Implemented Population Stability Index (PSI)–based drift monitoring to detect feature and prediction distribution shift
Azure Event Hub Azure Data Factory Apache Sparks KAFKA Machine Learning Databricks

Multi-Agent Medical Report Analysis with Hallucination Control

2025 · UIC
  • Architected a 7-stage analysis pipeline using LangChain and Google Gemini API, orchestrating parallel execution of 3 specialist AI agents via Python's ThreadPoolExecutor for concurrent inference
  • Implemented a multi-layer data validation framework, including evidence grounding checks, cross-agent consistency validation, and confidence calibration scoring to quantify hallucination risk
  • Designed a Multidisciplinary Team (MDT) agent to synthesize specialist findings into a unified diagnosis with confidence scoring and risk-level categorization
  • Built a Flask-based backend generating structured JSON/PDF outputs, enforcing schema compliance and traceable reasoning through prompt engineering and temperature control
Google Gemini API LangChain Multi-Agent AI Flask Python LLM prompt engineering

Cloud Data Warehouse & Analytics using Snowflake

July 2025
  • Built Snowflake cloud data warehouse using Medallion Architecture with ETL pipelines from AWS S3
  • Implemented star schema for high-performance reporting and analytics
  • Utilized advanced SQL window functions and stored procedures for customer segmentation
  • Delivered actionable insights into product performance and sales trends
Snowflake AWS S3 SQL ETL Star Schema

Suicide Ideation on Reddit using LLM

March 2025 - May 2025
  • Analyzed 225K+ Reddit posts/comments from r/SuicideWatch using Google Gemini LLM to extract emotion and behavior features, classifying 84K+ users into ‘Improved’ or ‘Declined’ emotional states
  • Built Python pipelines for data processing and longitudinal emotion tracking; applied statistical tests (Chi-Square, T-tests, VIF) and regression models to uncover patterns in emotions, engagement, and coping strategies for early interventions
Python (Pandas, NumPy) LLM (Google Gemini) Prompt Engineering Machine Learning (Scikit-learn) Statistical Testing (Chi-Square, T-tests, Regression)

Student Test Score Prediction using Machine Learning

December 2024 - January 2025
  • Analyzed student performance data and built modular ML pipelines for preprocessing, ingestion, and training, achieving 88% accuracy with Linear Regression and benchmarking against other models
  • Deployed the project using Docker and AWS (ECR, EC2) and implemented GitHub Actions for efficient CI/CD to streamline deployment
Python CI/CD Git Actions AWS Docker

JetBlue Airlines Performance Analysis

August 2024 - October 2024
  • Investigated airline comparison analysis using Tableau, analyzing 6.5 million flight records for JetBlue Airlines in 2023
  • Utilized Python (pandas) for data cleaning and analysis, and designed interactive Tableau dashboards to visualize key metrics and identify patterns in delays and operational inefficiencies
Python Tableau MS Excel

Sales Dashboard Development Using Tableau

May 2024 - July 2024
  • Developed a interactive Tableau dashboard for stakeholders to analyze year-over-year sales metrics, monthly trends, and performance insights
  • Improved decision-making by providing stakeholders with insights into sales performance and customer behaviors
Python Tableau MS Excel

Technical Skills

Comprehensive toolkit for modern data engineering

Data Engineering & Warehousing

Microsoft Fabric Azure Data Factory Azure Event Hub Azure Data Lakehouse Snowflake Apache Spark Kafka Databricks Delta Lake Apache Airflow dbt ETL/ELT Pipelines Medallion Architecture Change Data Capture (CDC) Structured Streaming Data Modeling MS SQL Server MySQL

Programming & Data Processing

Python SQL T-SQL SparkSQL PySpark Pandas NumPy Shell Scripting Linux

Analytics & Visualization

Power BI (DAX) Semantic Modelling Tableau MS Excel Pivot Tables Power Query

Machine Learning & Automation

Scikit-learn Predictive Analysis LLM Workflow Automation AI agents

Project Management

Agile SDLC Stakeholder Management Jira Microsoft office suite

Cloud & DevOps

Azure AWS S3 AWS Redshift Azure DevOps Git GitHub Actions Docker CI/CD

Certifications

Validated expertise in cloud and data analytics

Databricks Certified Data Engineer Associate

DE Associate

Microsoft Certified: Fabric Data Engineer Associate

DP-700

AWS Academy Data Engineering

Google Data Analytics Professional Certificate

Snowflake – The Complete Master Class

Get In Touch

Let's discuss how I can help with your data engineering needs

Location
Chicago, Illinois, USA

Available for full-time opportunities

Email Me