Interpretable Real Estate Recommendations

Summary of Interpretable Real Estate Recommendations

by Kyle Polich

32mSeptember 22, 2025

Summary — Interpretable Real Estate Recommendations (Data Skeptic podcast)

Host: Kyle Polich
Guest / Author: Kunal Mukherjee (Z-REX paper)

Overview

This episode discusses Z-REX — a method for producing human-interpretable explanations for graph neural network (GNN) based real-estate recommendations. The work addresses recommending novel regions (cities/neighborhoods) to users after COVID-driven migration patterns, and focuses on explanations tailored for analysts and end-users (not model internals). The approach combines attribute and structural perturbations on a tripartite user–listing–city graph to surface human-readable reasons for recommendations.


Motivation

  • Post-COVID mobility produced new real-estate hotspots (e.g., Frisco/Prosper near Dallas); users unfamiliar with these areas need discoverable recommendations plus explanations to build trust.
  • Standard recommendation scores (e.g., “9.72”) are not informative. Users prefer actionable, interpretable reasons (e.g., similar bedroom/bathroom mix, better schools, lower price).
  • Two audiences for explanations:
    • Model developers (technical, internal explanations).
    • Analysts/consultants/end-users (human-readable evidence why a recommendation matches preferences).

Problem formulation & data model

  • Real-estate domain is modeled as a tripartite graph: users, listings, and cities (regions). Listings belong to cities; users interact with listings.
  • Interactions have types with different strengths: view > save > favorite > tour (decreasing order of signal strength).
  • Dataset used in experiments: Seattle-area listings (region-specific feature importance).

Z-REX approach (high-level)

  1. Base model: a GNN (simple graph convolutional network / GCN variant) to capture both node attributes and graph structure, enabling discovery of new regions via structural paths.
  2. Explanation generation:
    • Attribute perturbation: zero-out or perturb candidate features to measure how node representations and recommendations change (used for feature importance).
    • Structural perturbation: instead of random edge removal, construct a smaller, meaningful subgraph of “co-clicked” cities (kuklik cities — cities other similar users clicked). Perturbations operate on this data-driven subgraph to find important structural components efficiently and meaningfully.
  3. Explanations presented as human-friendly evidence (e.g., “recommended because other users like you who clicked City A also clicked City B with similar attributes X, Y, Z”).

Evaluation metrics

  • Fidelity:
    • Fidelity+ — removing an important subgraph should change the prediction.
    • Fidelity− — removing an unimportant subgraph should not change the prediction.
  • NDCG (Normalized Discounted Cumulative Gain at k): measures recommendation quality accounting for position in ranked list (relevant high-ranked items rewarded; irrelevant top items penalized).
  • Other typical recommender metrics implied: accuracy, recall, F1 (but emphasis is on human-centric fidelity and NDCG).

Key findings & insights

  • GNNs offer advantages over classical methods (XGBoost, CatBoost) by exploiting graph structure to discover nonobvious region links and improve discoverability of new regions.
  • Industry often prefers simpler models (histogram-based, XGBoost, CatBoost) due to latency, maintenance, data-update cost — not always raw model capacity.
  • Feature importance is geographically dependent:
    • Seattle example: features like vacant land, carport, heating mattered; cooling, pools, fireplaces were less relevant there but could be important in other locales.
  • Structural perturbation using co-clicked (kuklik) cities reduces search complexity and yields more meaningful perturbations than random removal.
  • For explanations to be useful, the system must maintain user trust: early-stage convincing evidence is critical — if you lose trust early, correcting it later is hard.

Notable quotes / concise takeaways

  • “If you lose [the user’s] confidence once, the user might consider it as useless, even after it becomes relevant.” — emphasizes early-stage trust-building via explanations.
  • Two explanation audiences: model developers care about internal signals (neurons, activations); analysts want evidence in domain terms (features, peer behavior).

Topics discussed

  • Real-estate recommendation challenges post-COVID.
  • Tripartite graph modeling (users, listings, cities).
  • Interaction types and their relative signal strengths.
  • GNNs vs. classical ML in recommender systems (trade-offs).
  • Explainability techniques for GNNs: attribute and structural perturbations.
  • Data-driven structural perturbation via co-clicked city subgraphs (kuklik cities).
  • Fidelity metrics and NDCG for evaluation.
  • Region-dependent feature importance and the need for human verification.
  • Practical deployment concerns: latency, model maintenance, feature selection.

Action items / Recommendations (for practitioners)

  • Model & data:
    • Represent real-estate data as a tripartite graph (user–listing–city) to capture structure.
    • Include interaction types as weighted signals (differentiate views, saves, tours).
    • Perform feature normalization and selection early to reduce noise and memory/training costs.
  • Explainability:
    • Use both attribute perturbation (zeroing features) and data-driven structural perturbation (co-click subgraphs) for more meaningful explanations.
    • Evaluate explanations with fidelity+ and fidelity−, alongside ranking metrics (NDCG@k).
    • Tailor explanations for analysts/end-users: present features and peer behavior evidence rather than internal neural activations.
    • Run human verification studies to validate explanation usefulness, design them carefully (user tasks, A/B tests).
  • Deployment:
    • Consider engineering trade-offs: latency and data pipeline complexity may favor simpler models in production; explore hybrid approaches (e.g., simple production model + offline GNN explainers or periodic GNN suggestions).
    • Be region-aware: feature importance can shift dramatically by geography — localized models or region-specific feature weighting may help.
  • Future modeling improvements:
    • Try stronger GNN architectures (GAT, Graph Transformers) and compare recommendation quality.
    • Refine perturbation thresholds and weighting of interaction types to better separate hard vs soft negatives.

Future directions (from the guest)

  • Explore different GNN architectures (GAT, transformers) to maximize recommendation capability before explanation steps.
  • Improve feature and structural perturbation methods (thresholding, user/feature selection).
  • Implement human-subject verification studies to measure explanation effectiveness.
  • Consider weighting different interaction types to improve modeling and interpretation of hard/soft negatives.

This summary captures the central ideas of Z-REX: modeling the real-estate recommendation problem as a structured graph, creating human-oriented explanations via attribute and meaningful structural perturbations (kuklik cities), and balancing technical gains with practical deployment constraints and human trust needs.