Projects

A selection of coursework-based and exploratory projects that form my current foundation in modeling, data analysis, and information structure.

Most of this work was completed as part of my training in applied mathematics, data science, and programming. I use these projects to build technical depth while moving toward questions in clinical information quality and health data systems.

1. Modeling and data analysis (Python · statistics · ML)

Text-based classification: spam, movies, and signals in language

A series of projects using Python and scikit-learn to explore how simple models behave on text-derived features.

  • Spam / ham classification — engineered a binary classifier to distinguish spam from non-spam emails using text-based features, focusing on feature construction, preprocessing pipelines, and regularization to reduce overfitting.
  • Movie genre classification — implemented a k-nearest-neighbors classifier to distinguish thriller vs. comedy screenplays based on phrase frequency patterns, illustrating how representation choice shapes model behavior.

Tools: Python, scikit-learn, pandas, basic feature engineering, evaluation metrics.

Tabular modeling: housing prices and structured real-world data

Course-based work that uses regression models to understand variability in housing prices and to practice reasoning about model assumptions and error behavior.

  • Cook County housing exploration — performed exploratory data analysis to understand structure and missingness, engineered additional features, and fit linear models to predict housing prices; evaluated model performance and considered ways to improve generalization.

Tools: Python, pandas, scikit-learn, regression, error analysis, exploratory data analysis.

Climate, population, and poverty data analysis

Projects working with climate and demographic datasets to practice statistical inference, visualization, and interpretation under real-world noise.

  • Climate change data analysis — applied basic statistical inference methods to historical daily temperature and precipitation data from weather stations across U.S. cities, examining trends and variability.
  • Global population and poverty — analyzed global indicators such as life expectancy, fertility, and child mortality to explore their relationship with population growth and poverty dynamics.

Tools: Python, numpy, matplotlib, statistics, exploratory analysis, simple inference.

2. Analytical tools and visualization (R · Shiny · ggplot)

Shiny web app for mortgage accumulation

Developed a Shiny-based financial simulator to compute mortgage payments under varying assumptions. The interface allows users to input property price, down payment, term, and interest rate, and returns derived quantities such as monthly payment, total interest, and total cost.

Tools: R, Shiny, data cleaning, basic financial modeling, tabular and graphical output.

Mortgage amortization visualization

Constructed ggplot / ggplotly visualizations showing the evolution of principal, interest, and remaining balance over time. The focus was on making a dense time series legible, with attention to what different stakeholders (e.g., potential homebuyers) might need to see in order to reason about long-term commitments.

Tools: R, ggplot, ggplotly, statistics, data visualization.

These projects reflect an early interest in how numeric structure can be made more interpretable through visual and interactive tools—a theme that connects to later interests in clinical information presentation.

3. Programming foundations and interpreters (Python · Scheme · MATLAB)

Core programming projects (CS61A-style)

A set of projects designed to strengthen comfort with abstraction, control flow, and data structures.

  • Ants vs. SomeBees — implemented a tower defense game in Python using object-oriented programming, focusing on class design and interactions between entities.
  • Scheme interpreter — implemented an interpreter for a subset of the Scheme language, practicing environment models, evaluation rules, and recursive structure.
  • Autocorrected typing & dice game — created a typing-speed tool with autocorrect features and a dice game with strategy simulation, emphasizing higher-order functions and control logic.

Tools: Python, Scheme, functional programming concepts, object-oriented programming.

Numerical methods and interactive interfaces in MATLAB

MATLAB-based work combining numerical routines with basic interface design.

  • Modified zero-in root finding — implemented a root-finding routine using a combination of bisection and inverse quadratic interpolation to locate roots within a specified tolerance, reinforcing numerical analysis concepts.
  • Tarot card game — collaborated on a small interactive game with a MATLAB GUI, including card layout, button actions, and simple sound effects.

Tools: MATLAB, numerical methods, basic UI design, teamwork.

These projects primarily serve as programming and computational foundations, rather than as independent research. They shape how I think about building and reasoning about systems.

4. Tools and ongoing directions

Beyond coursework, I am gradually building small tools and conceptual prototypes that connect more directly to questions of information clarity and real-world decision-making.

  • Shiny-based analytical interfaces — lightweight interfaces for exploring numeric patterns and parameter changes, used as a way to think about how non-technical users interact with data.
  • Conceptual work on information clarity — small experiments on how different structures and visual representations affect comprehension, which parallel questions I am interested in within clinical documentation.
  • Early-stage prototypes related to health information — currently at the level of framing and small exploratory experiments, rather than formal studies.

As I move further into health data science and clinical information systems, I expect future projects to focus more explicitly on electronic health records, documentation variability, and data quality.

Code & links

Many of the projects above were completed as part of coursework and are not publicly available in full source form for academic integrity reasons. When possible, I share high-level descriptions, selected snippets, or related tools instead of full assignments.