My Projects

Data science and statistical modeling projects showcasing my analytical skills

Featured Work

XANE Chatbot

XANE Chatbot

An AI-powered assistant for interacting with my portfolio, answering queries, and processing files.

  • Rule-based system with LLM integration via OpenRouter API
  • Handles text, images, and PDFs using Supabase for storage
  • Image generation using Flux and Turbo via Pollinations
  • Built with Streamlit for a seamless user experience
Python Streamlit Supabase OpenRouter
Statistical Modeling Toolkit

Semi-Automated Statistical Modeling Toolkit

A comprehensive suite of statistical modeling tools focusing on rigorous analysis, proper diagnostics, and interpretable results rather than pure predictive performance.

  • Three specialized modules for Linear Regression, Logistic Regression, and Time Series Analysis
  • Emphasis on statistical assumptions, diagnostic checks, and model interpretation
  • Automated model diagnostics including multicollinearity, normality, and heteroscedasticity checks
  • Interactive visualizations for model evaluation and assumption verification
  • Detailed explanations of statistical concepts and results interpretation
Python Statsmodels Streamlit Plotly
Spanish Dairy Farms Analysis

Fixed Effects Panel Data Analysis of Spanish Dairy Farms

Econometric analysis of dairy farm productivity using panel data techniques.

  • Analyzed panel data (247 farms, 1,482 observations) using Cobb-Douglas and Translog models in R
  • Identified key drivers (Cows: 0.66%, Feed: 0.38%, p < 0.001) with increasing returns to scale (RTS = 1.11)
  • Ensured robustness with clustered errors and log transformation, achieving ~80% adjusted R²
View on GitHub
VIX Time Series Analysis

Time Series Analysis of the Volatility Index

Forecasting and modeling of market volatility using advanced time series techniques.

  • Applied AR(I) and cubic models to daily VIX data (1990–2025) in R, with 6.67% MAPE
  • Confirmed stationarity (p = 0.188) and validated with Box-Ljung and Box-Pierce tests
  • Forecasted VIX levels (19.3–28.65) through 2034 with confidence intervals
View on GitHub
Blood Pressure Modeling

Statistical Modeling of Blood Pressure

Regression analysis of hypertensive patient data to identify key predictors.

  • Built multiple linear regression models in R on 20 hypertensive patients
  • Identified key predictors (Age, BSA, Pulse) achieving adjusted R² = 90.1%
  • Addressed multicollinearity (removed Weight) and validated models via ANOVA
View on GitHub
Big Data Sampling Research

Sampling Techniques in Big Data and Stream Data Analytics

Academic research on efficient sampling methods for large-scale and real-time data analysis.

  • Comparative analysis of sampling techniques (Simple Random, Stratified, Reservoir)
  • Practical implementation examples using Python and Dask
  • Case studies on social media analytics and IoT data streams
Social Media Impact Research

Social Media and Its Impact on Society

Comprehensive study analyzing multidimensional effects of social media platforms.

  • Examined 12 key variables including mental health, education, and politics
  • Literature review of 15+ academic studies from 2019-2024
  • Designed comprehensive survey instrument covering all research dimensions
  • Policy recommendations for balanced social media usage
Survey Design Highlights
  • 9 thematic sections (Demographics, Time Use, Mental Health, etc.)
  • 40+ carefully crafted questions
  • Multiple question formats (MCQ, Likert scales, open-ended)
  • Variables mapped to research hypotheses
XANE Assistant