Overview of Data Skeptic: Healthy Friction in Job Recommender Systems
This episode of Data Skeptic (Recommender Systems series) features Roan (Roan Schadangerhout), a PhD student at Maastricht University, discussing his paper "Creating Healthy Friction: Determining Stakeholder Requirements of Job Recommendation Explanations." The conversation focuses on explainable, multi‑stakeholder job recommender systems: the explanation formats tested (text, bar charts, graph visualizations), how those explanations were generated (knowledge graphs + LLMs), an in‑lab user study comparing real vs. random explanations, and practical design implications for job portals and HR tools.
Key takeaways
- Multi‑stakeholder recommender systems must balance needs of job seekers, recruiters, and company HR — explanations are consumed by different audiences with different expectations.
- Lay users strongly prefer plain textual explanations over bar charts or raw graph visualizations.
- Users generally treat system explanations as another information source, not as the decisive reason for choosing a job/candidate; they still rely on their own judgment.
- In a small study (N=30, role‑play), participants often selected suboptimal matches (frequently selecting the second‑best ground‑truth option), and differences between real and random explanations were smaller than expected.
- Practical approach: use knowledge graphs to represent structured data and LLMs to translate graph evidence into user‑friendly textual explanations; but alignment between CV text and graph structure is important.
Study design and metrics
- Participants: 30 total — 10 job seekers, 10 recruiters, 10 company/HR representatives.
- Method: role‑play interface where participants were given a resume or job listing and asked to choose the best match from five options after inspecting explanations.
- Comparison: real (explainable recommender outputs) vs. random (nonsense) explanations as baseline.
- Objective metrics: correctness (choice vs. ground truth derived from dataset annotations like applied/interviewed/hired), time to decision.
- Subjective metrics: perceived trustworthiness, usefulness, transparency/understandability.
- Limitations: small N (limited statistical power), role‑play rather than live production deployment; ground truth from historical annotations can be fuzzy.
Explanation types tested & user reactions
- Textual explanations
- Format: short running text summarizing why candidate ↔ job are a fit (education, experience, skills, context).
- Reaction: preferred by most lay users and recruiters; easiest to understand and to use in follow‑up conversations.
- Bar charts
- Purpose: show relative feature contributions (industry standard).
- Reaction: generally disliked/confusing for lay users; served mainly as supportive summary when paired with text.
- Graph‑based explanations (knowledge graph visualizations)
- Format: nodes/edges showing paths connecting candidate attributes to job requirements; thick edges used to indicate important paths.
- Reaction: divisive — HR/company reps and data‑savvy users liked the dense information-at-a-glance; many recruiters and job seekers found graphs overwhelming without coaching.
Knowledge graph & model details
- Knowledge graph construction
- Start from tabular HR/recruiter data (CV fields, vacancies, annotations).
- Define ontology and run inference to add derived relations (e.g., job role → implied skills).
- Represent directed graphs in both directions: candidate→vacancy and vacancy→candidate; make separate predictions then combine them.
- Inputs and embeddings
- CVs/vacancies can be stored as plain text nodes or nodes with embeddings (TF/BERT).
- Storing embeddings in nodes can cause mismatches between free text content and structured graph features.
- Explanation generation pipeline
- Use graph paths/edges as structured evidence, then feed a JSON/graph representation into an LLM to produce lay‑friendly textual explanations.
- Skills & taxonomy
- Taxonomies (e.g., ESCO) are used where possible, but company‑specific or out‑of‑vocabulary skills are learned by models.
Results & surprising findings
- Real vs. random explanations: only small, non‑statistically significant trends favoring real explanations on subjective measures; objective decision accuracy differences were modest.
- User behavior: participants consumed explanations as informational context but relied on their own judgment, so explanations rarely dictated decisions.
- Common error: participants frequently selected a second‑best option rather than the highest ground‑truth match.
- Bar charts provided little additional value for lay users; textual explanations were most actionable.
Design implications & practitioner recommendations
- Prioritize clear textual explanations for lay users: short, recruiter‑style rationales that mirror how humans explain matches.
- Allow personalization of explanation verbosity and format:
- Some recruiters prefer long text (for follow‑up conversation content); others prefer concise summaries.
- Consider using personality or role cues to adapt explanation style (bulleted vs. running text, length, persuasive vs. decision‑support framing).
- Use LLMs to convert structured graph evidence into human language, but ensure fidelity/alignment between original CV/text and generated explanation (avoid hallucination/mismatch).
- Reserve graph visualizations for users with data/visualization literacy (HR analysts, power users); provide guided interpretation for others.
- Don’t assume explanations will override user judgment — treat them as decision‑support, not decision making.
- Evaluate explanations in realistic, live contexts (A/B tests with real job seekers & recruiters) and include fairness checks (gender, location).
Limitations & cautions
- Small sample size (N=30) reduces statistical confidence; reported trends may not generalize.
- Role‑play setup differs from real user behavior when searching for employment (stakes, effort, feedback loops).
- Ground truth matching labels (applied/interviewed/hired) are noisy proxies for "correctness" in hiring decisions.
- LLM‑based text generation requires careful prompt engineering and verification to avoid misleading or incorrect explanations.
Future work mentioned
- Automated knowledge graph construction from CVs and job postings using LLMs — current work in progress with evaluation of graph quality.
- Larger, real‑world deployments with company partners to validate effects on actual hiring outcomes.
- Fairness analyses focusing on gender and location bias.
- Investigating how personality traits map to explanation format/length preferences to enable adaptive explanations.
Notable quotes / concise insights
- Multi‑stakeholder recommender systems: "A recommender system where multiple stakeholder needs need to be balanced" (job seekers, recruiters, HR).
- On lay audiences: "If you're dealing with data scientists, SHAP/LIME are fine. Not with your average Joe."
- On user behavior: participants "used the explanations as an information source, but not as a reason to come to a certain decision."
Where to follow / sources
- Roan: LinkedIn and Google Scholar (conference updates and publications). Links provided in the episode show notes.
If you’re building or improving job recommendation UX: start with clear, recruiter‑style textual explanations; provide options for verbosity; align explanations tightly to source CV/text; run live experiments and fairness audits before wide release.
