Healthy Friction in Job Recommender Systems

Summary of Healthy Friction in Job Recommender Systems

by Kyle Polich

26mFebruary 2, 2026

Overview of Data Skeptic: Healthy Friction in Job Recommender Systems

This episode of Data Skeptic (Recommender Systems series) features Roan (Roan Schadangerhout), a PhD student at Maastricht University, discussing his paper "Creating Healthy Friction: Determining Stakeholder Requirements of Job Recommendation Explanations." The conversation focuses on explainable, multi‑stakeholder job recommender systems: the explanation formats tested (text, bar charts, graph visualizations), how those explanations were generated (knowledge graphs + LLMs), an in‑lab user study comparing real vs. random explanations, and practical design implications for job portals and HR tools.

Key takeaways

  • Multi‑stakeholder recommender systems must balance needs of job seekers, recruiters, and company HR — explanations are consumed by different audiences with different expectations.
  • Lay users strongly prefer plain textual explanations over bar charts or raw graph visualizations.
  • Users generally treat system explanations as another information source, not as the decisive reason for choosing a job/candidate; they still rely on their own judgment.
  • In a small study (N=30, role‑play), participants often selected suboptimal matches (frequently selecting the second‑best ground‑truth option), and differences between real and random explanations were smaller than expected.
  • Practical approach: use knowledge graphs to represent structured data and LLMs to translate graph evidence into user‑friendly textual explanations; but alignment between CV text and graph structure is important.

Study design and metrics

  • Participants: 30 total — 10 job seekers, 10 recruiters, 10 company/HR representatives.
  • Method: role‑play interface where participants were given a resume or job listing and asked to choose the best match from five options after inspecting explanations.
  • Comparison: real (explainable recommender outputs) vs. random (nonsense) explanations as baseline.
  • Objective metrics: correctness (choice vs. ground truth derived from dataset annotations like applied/interviewed/hired), time to decision.
  • Subjective metrics: perceived trustworthiness, usefulness, transparency/understandability.
  • Limitations: small N (limited statistical power), role‑play rather than live production deployment; ground truth from historical annotations can be fuzzy.

Explanation types tested & user reactions

  • Textual explanations
    • Format: short running text summarizing why candidate ↔ job are a fit (education, experience, skills, context).
    • Reaction: preferred by most lay users and recruiters; easiest to understand and to use in follow‑up conversations.
  • Bar charts
    • Purpose: show relative feature contributions (industry standard).
    • Reaction: generally disliked/confusing for lay users; served mainly as supportive summary when paired with text.
  • Graph‑based explanations (knowledge graph visualizations)
    • Format: nodes/edges showing paths connecting candidate attributes to job requirements; thick edges used to indicate important paths.
    • Reaction: divisive — HR/company reps and data‑savvy users liked the dense information-at-a-glance; many recruiters and job seekers found graphs overwhelming without coaching.

Knowledge graph & model details

  • Knowledge graph construction
    • Start from tabular HR/recruiter data (CV fields, vacancies, annotations).
    • Define ontology and run inference to add derived relations (e.g., job role → implied skills).
    • Represent directed graphs in both directions: candidate→vacancy and vacancy→candidate; make separate predictions then combine them.
  • Inputs and embeddings
    • CVs/vacancies can be stored as plain text nodes or nodes with embeddings (TF/BERT).
    • Storing embeddings in nodes can cause mismatches between free text content and structured graph features.
  • Explanation generation pipeline
    • Use graph paths/edges as structured evidence, then feed a JSON/graph representation into an LLM to produce lay‑friendly textual explanations.
  • Skills & taxonomy
    • Taxonomies (e.g., ESCO) are used where possible, but company‑specific or out‑of‑vocabulary skills are learned by models.

Results & surprising findings

  • Real vs. random explanations: only small, non‑statistically significant trends favoring real explanations on subjective measures; objective decision accuracy differences were modest.
  • User behavior: participants consumed explanations as informational context but relied on their own judgment, so explanations rarely dictated decisions.
  • Common error: participants frequently selected a second‑best option rather than the highest ground‑truth match.
  • Bar charts provided little additional value for lay users; textual explanations were most actionable.

Design implications & practitioner recommendations

  • Prioritize clear textual explanations for lay users: short, recruiter‑style rationales that mirror how humans explain matches.
  • Allow personalization of explanation verbosity and format:
    • Some recruiters prefer long text (for follow‑up conversation content); others prefer concise summaries.
    • Consider using personality or role cues to adapt explanation style (bulleted vs. running text, length, persuasive vs. decision‑support framing).
  • Use LLMs to convert structured graph evidence into human language, but ensure fidelity/alignment between original CV/text and generated explanation (avoid hallucination/mismatch).
  • Reserve graph visualizations for users with data/visualization literacy (HR analysts, power users); provide guided interpretation for others.
  • Don’t assume explanations will override user judgment — treat them as decision‑support, not decision making.
  • Evaluate explanations in realistic, live contexts (A/B tests with real job seekers & recruiters) and include fairness checks (gender, location).

Limitations & cautions

  • Small sample size (N=30) reduces statistical confidence; reported trends may not generalize.
  • Role‑play setup differs from real user behavior when searching for employment (stakes, effort, feedback loops).
  • Ground truth matching labels (applied/interviewed/hired) are noisy proxies for "correctness" in hiring decisions.
  • LLM‑based text generation requires careful prompt engineering and verification to avoid misleading or incorrect explanations.

Future work mentioned

  • Automated knowledge graph construction from CVs and job postings using LLMs — current work in progress with evaluation of graph quality.
  • Larger, real‑world deployments with company partners to validate effects on actual hiring outcomes.
  • Fairness analyses focusing on gender and location bias.
  • Investigating how personality traits map to explanation format/length preferences to enable adaptive explanations.

Notable quotes / concise insights

  • Multi‑stakeholder recommender systems: "A recommender system where multiple stakeholder needs need to be balanced" (job seekers, recruiters, HR).
  • On lay audiences: "If you're dealing with data scientists, SHAP/LIME are fine. Not with your average Joe."
  • On user behavior: participants "used the explanations as an information source, but not as a reason to come to a certain decision."

Where to follow / sources

  • Roan: LinkedIn and Google Scholar (conference updates and publications). Links provided in the episode show notes.

If you’re building or improving job recommendation UX: start with clear, recruiter‑style textual explanations; provide options for verbosity; align explanations tightly to source CV/text; run live experiments and fairness audits before wide release.