California House Price Prediction Model 🏠

License: MIT Python 3.8+ scikit-learn

A machine learning model for predicting California house prices based on various features like location, age, size, and proximity to the ocean. This model uses a Random Forest Regressor trained on the California Housing dataset and achieves strong predictive performance.

πŸ“Š Model Overview

  • Model Type: Random Forest Regressor (scikit-learn)
  • Task: Regression (Predict median house value)
  • Training Data: California Housing dataset (20,640 instances)
  • Performance: Final RMSE on test set: ~$47,000-49,000
  • Features: 8 numerical features + 1 categorical feature (ocean_proximity)
  • Target: Median house value in California districts (in USD)

🎯 Use Cases

  • Real estate price estimation
  • Housing market analysis
  • Property valuation for California regions
  • Educational demonstrations of regression modeling

πŸ“₯ Installation

Clone the repository

git clone https://huggingface.co/nitish-niraj/house-price-prediction
cd house-price-prediction

Install dependencies

pip install -r requirements.txt

πŸš€ Quick Start

Using the Python API

from inference import load_model

# Load the model
predictor = load_model()

# Prepare input data
house_data = {
    'longitude': -122.23,
    'latitude': 37.88,
    'housing_median_age': 41.0,
    'total_rooms': 880.0,
    'total_bedrooms': 129.0,
    'population': 322.0,
    'households': 126.0,
    'median_income': 8.3252,
    'ocean_proximity': 'NEAR BAY'
}

# Make prediction
predicted_price = predictor.predict(house_data)
print(f"Predicted house price: ${predicted_price[0]:,.2f}")

Using the convenience function

from inference import HousePricePredictor

predictor = HousePricePredictor()
predictor.load()

# Predict single house price
price = predictor.predict_single(
    longitude=-122.23,
    latitude=37.88,
    housing_median_age=41.0,
    total_rooms=880.0,
    total_bedrooms=129.0,
    population=322.0,
    households=126.0,
    median_income=8.3252,
    ocean_proximity='NEAR BAY'
)
print(f"Predicted price: ${price:,.2f}")

Batch predictions

import pandas as pd
from inference import load_model

predictor = load_model()

# Prepare multiple houses
houses_df = pd.DataFrame([
    {'longitude': -122.23, 'latitude': 37.88, 'housing_median_age': 41.0,
     'total_rooms': 880.0, 'total_bedrooms': 129.0, 'population': 322.0,
     'households': 126.0, 'median_income': 8.3252, 'ocean_proximity': 'NEAR BAY'},
    {'longitude': -122.22, 'latitude': 37.86, 'housing_median_age': 21.0,
     'total_rooms': 7099.0, 'total_bedrooms': 1106.0, 'population': 2401.0,
     'households': 1138.0, 'median_income': 8.3014, 'ocean_proximity': 'NEAR BAY'},
])

# Predict all at once
predictions = predictor.predict(houses_df)
for i, price in enumerate(predictions):
    print(f"House {i+1}: ${price:,.2f}")

πŸ“‹ Input Features

The model requires the following features for prediction:

Feature Type Description Example
longitude float Longitude coordinate of the house -122.23
latitude float Latitude coordinate of the house 37.88
housing_median_age float Median age of houses in the district 41.0
total_rooms float Total number of rooms in the district 880.0
total_bedrooms float Total number of bedrooms in the district 129.0
population float Total population in the district 322.0
households float Total number of households in the district 126.0
median_income float Median income (in tens of thousands USD) 8.3252
ocean_proximity string Proximity to ocean One of: <1H OCEAN, INLAND, NEAR OCEAN, NEAR BAY, ISLAND

🎨 Gradio Demo

A Gradio web interface is included in the notebook for interactive predictions:

# Run the Gradio demo (from the notebook)
import gradio as gr
# See housepriceprediction.ipynb for the full demo code

πŸ“ˆ Model Training Details

Training Process

  1. Data Preprocessing:

    • Handled missing values using median imputation
    • Created stratified train-test split (80-20) based on income categories
    • Feature engineering: Added derived features (rooms_per_household, etc.)
    • Standardized numerical features using StandardScaler
    • One-hot encoded categorical feature (ocean_proximity)
  2. Model Selection:

    • Compared Linear Regression, Decision Tree, and Random Forest
    • Random Forest showed best performance
  3. Hyperparameter Tuning:

    • Used GridSearchCV with 5-fold cross-validation
    • Optimized parameters: n_estimators, max_features, bootstrap
    • Best parameters: {'max_features': 8, 'n_estimators': 30}
  4. Evaluation:

    • Primary metric: RMSE (Root Mean Squared Error)
    • Cross-validation RMSE: ~$49,000
    • Final test set RMSE: ~$47,000-49,000

Feature Importance

Top features contributing to predictions (from the trained model):

  1. Median Income
  2. Longitude
  3. Latitude
  4. Housing Median Age
  5. Ocean Proximity

πŸ“¦ Model Files

  • house_price_model.joblib (80+ MB) - Trained Random Forest model
  • preprocessing_pipeline.joblib (2+ KB) - Data preprocessing pipeline
  • inference.py - Python inference API
  • housepriceprediction.ipynb - Training notebook with Gradio demo

πŸ”§ Requirements

  • Python 3.8+
  • scikit-learn >= 1.3.0
  • pandas >= 2.0.0
  • numpy >= 1.24.0
  • joblib >= 1.3.0
  • gradio >= 4.0.0 (optional, for demo)

See requirements.txt for complete dependencies.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Feel free to:

  • Report bugs
  • Suggest new features
  • Submit pull requests

πŸ“š References

  • Dataset: California Housing Dataset
  • Inspired by: Hands-On Machine Learning with Scikit-Learn and TensorFlow by AurΓ©lien GΓ©ron

πŸ‘€ Author

nitish-niraj

🌟 Acknowledgments

  • California Housing dataset from the 1990 U.S. Census
  • scikit-learn community for excellent ML tools
  • Hugging Face for model hosting platform

Note: This model is trained on 1990 census data and is intended for educational and demonstration purposes. For real-world applications, consider using more recent data and additional features.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using niru-nny/house-price-prediction 1