Skip to content

Improve ML basics tutorial notebook#424

Open
duhmariya wants to merge 1 commit into
school-brainhack:mainfrom
duhmariya:improve-ml-basics-notebook
Open

Improve ML basics tutorial notebook#424
duhmariya wants to merge 1 commit into
school-brainhack:mainfrom
duhmariya:improve-ml-basics-notebook

Conversation

@duhmariya
Copy link
Copy Markdown

Improved ML Basics Tutorial Notebook

Changes:

  • Fixed data leakage: StandardScaler is now applied correctly within train/test split (not before cross_val_score)
  • Added regression section (Linear Regression on Diabetes dataset with MSE and R² evaluation)
  • Added unsupervised learning section (K-Means clustering)
  • Added overfitting/underfitting demonstration (KNN with varying k from 1 to 30)
  • Improved visualizations with labeled axes, titles, and confusion matrix
  • Added variance explained analysis for cross-validation

Motivation:

The previous notebook had a scaling issue where X_train was standard-scaled before being passed to cross_val_score, causing data leakage. This version restructures the pipeline to follow ML best practices while covering classification, regression, and clustering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant