Recipe for Rating: Predict Food Ratings using ML
Project Overview: Recipe for Rating leverages machine learning models to predict food ratings from user-submitted reviews and associated numeric and categorical data. By preprocessing text, numeric, and categorical features, the project trains multiple models to classify ratings effectively, including advanced two-stage classification strategies to handle dataset bias.
Project source code: https://github.com/Risdorn/Recipe-for-Rating
Objectives
- Predict food ratings accurately using machine learning techniques.
- Preprocess numeric, categorical, and text features for optimal model performance.
- Compare multiple models and identify the best approach for imbalanced datasets.
Features
-
Data Preprocessing:
- Numeric features scaled using StandardScaler.
- Categorical features encoded with OneHotEncoder.
- Text reviews transformed with TfidfVectorizer.
-
Model Training:
- Logistic Regression, Random Forest, SVM, KNN, AdaBoost, Balanced Bagging, and MLP classifiers.
- Pipelines for each model combining preprocessing and training steps.
-
Two-Stage Classification:
- Stage 1: Distinguish rating 5 from all others.
- Stage 2: Classify remaining ratings (0–4).
-
Evaluation & Comparison:
- Accuracy, precision, recall, and F1-score metrics.
- Randomized Search for hyperparameter tuning of selected models.
Technology Stack
- Language: Python
- Libraries: Scikit-learn, PyCaret, Pandas, NumPy, TfidfVectorizer
- Tools: Jupyter Notebook for experimentation, training, and visualization
Outcome
Recipe for Rating successfully predicted food ratings from a highly imbalanced dataset, achieving top Kaggle performance (score 0.78288). The two-stage classification approach improved prediction accuracy for minority classes, and comprehensive evaluation across multiple models provided insights into optimal preprocessing and model selection strategies.