PRASAD VICHARE

📊

Extract

→

⚙️

Transform

→

💾

Load

→

📈

Analyze

👋 Hi there! I’m a Data Engineer & Analytics Professional with 6+ years of experience turning raw data into business impact. I love building scalable pipelines, cloud warehouses, ML Integrated solutions, and interactive dashboards that make data work harder (so humans don’t have to 😅). Specialized in Azure, Microsoft Fabric, Databricks, Snowflake, Python, PySpark, SQL, and BI tools.

💡 I’m passionate about solving problems with data, optimizing processes, and building solutions that scale - from ETL pipelines to BI dashboards.

Years of Professional Work Experience

20M+

Daily telecom events processed through scalable batch & streaming pipelines

35% ↑

Improvement in financial data accuracy through automated ADF & Fabric pipelines

70% ↓

Reduction in deployment errors via CI/CD pipelines using Azure DevOps & GitHub Actions

Core Technologies

The platforms and tools I build with every day

Microsoft Fabric

Azure Data Factory

Databricks

❄️

Snowflake

Apache Airflow

Spark

Python

SQL

AI & ML

Tableau

Power BI

AWS

Get In Touch View Projects

About Me

MS Management Information Systems

University of Illinois Chicago

Data Engineer

University of Illinois Chicago

Chicago, IL

Currently Based

Work Experience

Delivering data-driven solutions across multiple domains

Data Engineer | Graduate Assistant

University of Illinois Chicago

December 2024 – Present | Chicago, IL

Improved financial data accuracy by 35% by designing automated Azure Data Factory pipelines and Microsoft Fabric dataflows that consolidated complex ERP datasets into a centralized Azure Lakehouse, eliminating manual reconciliation workflows.
Reduced ad-hoc reporting requests by ~40% by building medallion-layer (Bronze → Silver → Gold) datasets using PySpark and SQL in Databricks with enforced schema validation and automated data quality checks.
Enabled real-time budget monitoring for 150+ university stakeholders by engineering Delta Lake datasets powering 20+ Tableau dashboards used for variance analysis and financial planning decisions.
Decreased pipeline runtimes and ensured SLA-compliant data freshness by implementing Change Data Capture (CDC) and incremental load strategies across recurring finance ingestion workflows.

Data Engineer | Deputy Manager

Reliance Jio Infocom Pvt. Ltd

September 2020 - July 2024 | Mumbai, India

Processed 15M–20M daily telecom events by developing scalable batch and streaming pipelines using Apache Spark and Azure Event Hub, enabling near-real-time analytics and spam detection capabilities.
Improved ML dataset reliability by transforming raw telecom data into curated Lakehouse datasets through ELT workflows built with Microsoft Fabric, PySpark, and SQL while enforcing schema evolution and data contracts.
Enabled parallel batch and streaming architectures with exactly-once processing guarantees by implementing near-real-time ingestion using Kafka-compatible APIs on Azure Event Hub.
Accelerated query performance on multi-terabyte datasets through optimized schema design, partition pruning strategies, and Z-ordering on Delta Lake tables, improving analytics responsiveness for network monitoring teams.
Reduced deployment errors by 80% and standardized release workflows by designing CI/CD pipelines using Azure DevOps and GitHub Actions for ETL scripts and data models.

Data Analyst

Quess Corporation

January 2019 – September 2020 | Mumbai, India

Streamlined analytics for daily operations covering 50K+ records by developing automated ETL pipelines with SQL to move operational data from AWS S3/Redshift into Snowflake, ensuring timely data availability
Architected a Snowflake warehouse using Medallion architecture (Bronze–Silver–Gold layers), creating optimized views and stored procedures at the gold layer to support critical reporting needs
Improved leadership decision-making by delivering Power BI dashboards powered by automated data pipelines, providing interactive KPIs and drill-down insights into operational performance.

Academic Projects

Hands-on implementation of data engineering best practices

Real-Time Fraud Detection with Drift Monitoring (Azure | Streaming | ML)

Nov 2025

Designed and implemented an end-to-end real-time fraud detection system supporting batch and streaming workflows using Azure Data Factory, Event Hubs (Kafka), Databricks Structured Streaming, and Delta Lake
Built a Bronze–Silver–Gold lakehouse architecture on ADLS Gen2, enabling scalable data ingestion
Developed a machine learning pipeline with real-time feature engineering and low-latency inference to score transactions and generate fraud alerts in near real time
Implemented Population Stability Index (PSI)–based drift monitoring to detect feature and prediction distribution shift

Azure Event Hub Azure Data Factory Apache Sparks KAFKA Machine Learning Databricks

Multi-Agent Medical Report Analysis with Hallucination Control

2025 · UIC

Architected a 7-stage analysis pipeline using LangChain and Google Gemini API, orchestrating parallel execution of 3 specialist AI agents via Python's ThreadPoolExecutor for concurrent inference
Implemented a multi-layer data validation framework, including evidence grounding checks, cross-agent consistency validation, and confidence calibration scoring to quantify hallucination risk
Designed a Multidisciplinary Team (MDT) agent to synthesize specialist findings into a unified diagnosis with confidence scoring and risk-level categorization
Built a Flask-based backend generating structured JSON/PDF outputs, enforcing schema compliance and traceable reasoning through prompt engineering and temperature control

Google Gemini API LangChain Multi-Agent AI Flask Python LLM prompt engineering

Cloud Data Warehouse & Analytics using Snowflake

July 2025

Built Snowflake cloud data warehouse using Medallion Architecture with ETL pipelines from AWS S3
Implemented star schema for high-performance reporting and analytics
Utilized advanced SQL window functions and stored procedures for customer segmentation
Delivered actionable insights into product performance and sales trends

Snowflake AWS S3 SQL ETL Star Schema

Suicide Ideation on Reddit using LLM

March 2025 - May 2025

Analyzed 225K+ Reddit posts/comments from r/SuicideWatch using Google Gemini LLM to extract emotion and behavior features, classifying 84K+ users into ‘Improved’ or ‘Declined’ emotional states
Built Python pipelines for data processing and longitudinal emotion tracking; applied statistical tests (Chi-Square, T-tests, VIF) and regression models to uncover patterns in emotions, engagement, and coping strategies for early interventions

Python (Pandas, NumPy) LLM (Google Gemini) Prompt Engineering Machine Learning (Scikit-learn) Statistical Testing (Chi-Square, T-tests, Regression)

Student Test Score Prediction using Machine Learning

December 2024 - January 2025

Analyzed student performance data and built modular ML pipelines for preprocessing, ingestion, and training, achieving 88% accuracy with Linear Regression and benchmarking against other models
Deployed the project using Docker and AWS (ECR, EC2) and implemented GitHub Actions for efficient CI/CD to streamline deployment

Python CI/CD Git Actions AWS Docker

JetBlue Airlines Performance Analysis

August 2024 - October 2024

Investigated airline comparison analysis using Tableau, analyzing 6.5 million flight records for JetBlue Airlines in 2023
Utilized Python (pandas) for data cleaning and analysis, and designed interactive Tableau dashboards to visualize key metrics and identify patterns in delays and operational inefficiencies

Python Tableau MS Excel

Sales Dashboard Development Using Tableau

May 2024 - July 2024

Developed a interactive Tableau dashboard for stakeholders to analyze year-over-year sales metrics, monthly trends, and performance insights
Improved decision-making by providing stakeholders with insights into sales performance and customer behaviors

Python Tableau MS Excel

Technical Skills

Comprehensive toolkit for modern data engineering

Data Engineering & Warehousing

Microsoft Fabric Azure Data Factory Azure Event Hub Azure Data Lakehouse Snowflake Apache Spark Kafka Databricks Delta Lake Apache Airflow dbt ETL/ELT Pipelines Medallion Architecture Change Data Capture (CDC) Structured Streaming Data Modeling MS SQL Server MySQL

Programming & Data Processing

Python SQL T-SQL SparkSQL PySpark Pandas NumPy Shell Scripting Linux

Analytics & Visualization

Power BI (DAX) Semantic Modelling Tableau MS Excel Pivot Tables Power Query

Machine Learning & Automation

Scikit-learn Predictive Analysis LLM Workflow Automation AI agents

Project Management

Agile SDLC Stakeholder Management Jira Microsoft office suite

Cloud & DevOps

Azure AWS S3 AWS Redshift Azure DevOps Git GitHub Actions Docker CI/CD

Certifications

Validated expertise in cloud and data analytics

Databricks Certified Data Engineer Associate

DE Associate

Microsoft Certified: Fabric Data Engineer Associate

DP-700

AWS Academy Data Engineering

Google Data Analytics Professional Certificate

Snowflake – The Complete Master Class

Get In Touch

Let's discuss how I can help with your data engineering needs

Email
pvich@uic.edu

Phone
+1 312-459-8976

Location
Chicago, Illinois, USA

Available for full-time opportunities

Email Me