Tayler Erbe · Project Case Study · AITS / University of Illinois System · 2025

HR Workforce
Analytics Intelligence
Platform

A predictive workforce analytics initiative designed to transform raw HR and payroll data from the university's Enterprise Data Warehouse into forward-looking strategic intelligence: surfacing succession gaps, internal career pathways, attrition risk, and workforce growth trends before they become problems.

Active  ·  POC Complete · Full Build Under Review
Organization
AITS Decision Support · University of Illinois System
HR Stakeholder
Luanne Mayorga ↗ · Assistant Chancellor, Illinois HR
Status
POC Complete · Full Project Build Under Review
POC Work Completed
Internal Mobility · Succession Readiness · Growth Forecasting
Role
Lead Data Scientist · Platform Architect · Full Lifecycle Ownership
Planned Team
Lead Data Scientist + 3 Intern Analysts · ~2,460 hrs · 6 months
3
POC Workstreams Completed
6+
EDW HR Tables Referenced
5
Core Analytics Capabilities
EDW
Existing HR & Payroll Infrastructure

Project Overview

Workforce planning at most universities: including the University of Illinois System: is largely reactive. Leadership learns about staffing challenges after they have already materialized: critical roles become vacant with no internal successors ready, departments lose years of institutional knowledge through unexpected attrition, and hiring decisions are made without visibility into internal mobility patterns or future demand.

This initiative is designed to change that. Developed in partnership with Luanne Mayorga, Assistant Chancellor for Illinois HR, the HR Workforce Analytics Intelligence Platform applies predictive modeling, career path simulation, and graph analytics to the university's existing HR and payroll data: no new data collection required: to generate forward-looking intelligence that HR teams and leadership can act on before gaps become crises.

The project has already completed three substantial proof-of-concept workstreams covering internal mobility modeling, succession readiness prediction, and workforce growth forecasting. The platform architecture, modeling approach, and feature engineering pipeline are fully designed. The full project build is currently under review with the goal of beginning production development soon.

All POC work was built on a synthetic dataset mirroring the structure of EDW HR tables: demonstrating the full analytical pipeline before connecting to production data, which will occur in Phase 1 under appropriate data governance review.

The Core Problem

The university cannot currently answer several fundamental workforce questions in real time: Which departments will face capacity gaps in the next 12 months? Which roles have no internal succession pipeline? Where is turnover signaling instability versus normal movement? Which employees are quietly at flight risk? Without predictive analytics, these questions surface only after the damage has been done.

What Makes This Different

Most HR reporting is descriptive: it tells you what happened. This platform is generative and predictive: it simulates where the workforce is going, who is ready to step up, which career pathways are silently blocked, and which departments are structurally understaffed before those conditions become visible in turnover reports.

Existing Data Infrastructure

Every insight the platform generates is derived from data the university already collects: HR records, payroll history, job titles, and tenure data within the Enterprise Data Warehouse. This is an analytics problem, not a data collection problem.

Seeing What the University Doesn't Know

The most important value this platform creates is not answering questions that are already being asked: it's surfacing insights no one knew to look for. The shift from reactive to proactive workforce intelligence means giving leadership and HR the visibility to act on trends months before those trends become operational problems.

Today: What Gets Seen Too Late
  • A long-tenured director retires with no successor identified or developed
  • A department loses three analysts in six months: instability noticed in an annual report
  • An open role sits unfilled for 90 days; qualified internal candidates were never surfaced
  • A cohort of mid-career employees in a low-growth role quietly disengages over two years
  • Leadership is surprised by budget pressure from overlapping vacancies across a unit
  • An entire cluster of specialized knowledge retires without a documented succession path
  • A role is repeatedly filled externally when strong internal candidates existed all along
With the Platform: Known in Advance
  • Which departments are most likely to face capacity gaps in the next 12–18 months
  • Which critical roles have no viable internal succession pipeline right now
  • Which employees have the strongest readiness profile to step into a target role
  • Which roles act as structural dead ends: stalling careers and driving attrition
  • Which units are silently losing institutional knowledge through retirement concentration
  • Where cross-department mobility opportunities exist that are currently invisible
  • Which job families are at risk from labor market competition based on external growth data
The platform is not a reporting tool. It is an intelligence layer: a system that continuously processes workforce data and surfaces the patterns that no one has time to look for manually, enabling the university to plan rather than react.

Three Core Analytics Capabilities

The platform delivers three interconnected analytical capabilities, each targeting a different dimension of workforce intelligence. Together they provide a comprehensive, forward-looking view of how the workforce is structured, how it moves, and where it is headed.

01
Internal Mobility Modeling & Career Path Analysis

Analyzes historical job transition data to construct a comprehensive map of how employees actually move through the organization: which roles feed into which, which pathways accelerate careers, and which are structural dead ends that stall progression and quietly drive attrition.

The core of this capability is a career transition matrix built from each employee's historical job title sequence: a probabilistic map of the most likely next role from any starting position. On top of that matrix, a Markov chain simulation engine projects career journeys forward, generating distributions of where employees in a given role tend to land after N moves. This gives HR a generative view of internal mobility: not just what happened historically, but what will likely happen next.

Graph analytics are then applied to the transition network to identify natural career communities: clusters of roles that tend to flow into each other: and to compute PageRank-style centrality scores that surface which roles are hubs of career movement versus isolated dead ends.

Markov Chain Transition Matrix NetworkX Graph Community Detection Career Pathways Dead-End Detection Feeder Role Analysis
02
Succession Readiness Prediction

Identifies which internal employees are most ready to step into leadership or specialist roles before those positions become vacant: and flags roles across the organization that have no viable internal succession pipeline at all.

Two sub-models drive this capability. The first estimates departure risk: the probability that a role becomes vacant in the next 6–12 months: using features like tenure slope, proximity to pay band ceiling, department churn rates, and retirement eligibility (SURS) indicators. The second estimates candidate readiness, predicting how well a given employee fits a target role based on skill similarity, grade progression, and mobility history.

These two scores combine into a Succession Readiness Index (SRI): a composite score per (candidate, role) pair that ranks internal candidates and provides interpretable rationale for each recommendation.

SRI = w₁·p_success + w₂·SkillSim + w₃·TenureFit + w₄·ExperienceFit
Succession Readiness Index: a weighted composite of predictive readiness components per candidate-role pair
XGBoost / Logistic Regression Gradient Boosting LambdaMART Departure Risk SRI Score Equity Audit View
03
Workforce Growth Forecasting & Attrition Intelligence

Projects headcount, hiring demand, and turnover patterns by department and job family: giving leadership a data-driven view of where the workforce is contracting, growing, and becoming structurally unstable over a 1–3 year horizon.

Time series and trend analysis methods are applied to historical EDW data to model each department's growth trajectory, turnover seasonality, and net headcount change. Clustering algorithms group departments by workforce behavior patterns: distinguishing stable and growing units from those showing early signs of instability. External labor market data from O*NET and BLS is incorporated to benchmark internal roles against occupational growth outlooks, identifying job families at elevated risk from market competition.

Time Series Analysis Trend Decomposition Department Clustering O*NET Integration BLS Growth Outlook Attrition Scoring

Models & Algorithms

The platform uses a multi-model architecture: combining supervised machine learning, generative simulation, graph analytics, and NLP-driven skill enrichment to provide complementary views of workforce dynamics. No single model captures the full picture; the insight emerges from how these layers interact.

Promotion Prediction
Random Forest Classifier · Supervised
PurposePredict likelihood of promotion vs. lateral move within 12–18 months
Key InputsAge, tenure, time in role, move rate, pay grade, job family, growth outlook, title embeddings
PipelineColumnTransformer → StandardScaler + OneHotEncoder → RandomForest (class_weight="balanced")
OutputPromotion likelihood score per employee
RationaleInterpretable baseline; balances class imbalance; feature importance directly explainable in HR terms
Career Path Simulation
Markov Chain · Generative
PurposeSimulate probable career journeys forward from any starting role across N moves
Key InputsCareer transition matrix built from parsed past_job_titles sequences per employee
Methodsimulate_career_path() walks transitions stochastically; simulate_many_paths() summarizes final-role distributions
OutputProbability distributions of career destinations from any starting job title
Use Case"After 4 moves, where do Academic Advisors typically end up?": shown as a Sankey-style flow
Career Community Detection
Graph Analytics · NetworkX
PurposeIdentify natural clusters of roles that flow into each other: "career families"
Key InputsWeighted directed graph of role-to-role transitions (edge weight = transition probability)
Algorithmgreedy_modularity_communities (NetworkX); PageRank-style centrality for hub role identification
OutputCareer clusters (e.g., "IT Cluster," "Academic Affairs Cluster"); hub roles; dead-end and feeder role classification
Use CaseVisualize career communities; surface roles that connect many pathways vs. those that isolate employees
Departure Risk Model
XGBoost / Logistic Regression · Supervised
PurposeEstimate probability a role becomes vacant in the next 6–12 months
Key InputsTenure slope, SURS retirement indicator, department churn rate, role-level attrition history, pay band ceiling proximity
Outputp_vacancy: vacancy probability score per position
Succession Linkp_vacancy feeds directly into SRI matching: high-departure roles are automatically prioritized for succession analysis
FallbackIf model performance is insufficient, heuristic rules based on SURS status and tenure are used
Candidate Readiness & Succession Matching
Gradient Boosting · Learning-to-Rank (LambdaMART) · Semantic Embedding
What It Does
Predicts how well an internal candidate fits a specific target role. Uses a Learning-to-Rank approach (LambdaMART) to rank candidates within each role context rather than producing a single absolute score in isolation.
Skill Enrichment Layer
LLMs (GPT/LLaMA) extract skills from job titles and descriptions. Skills are mapped to O*NET canonical standards. Sentence Transformers (InstructorXL) generate semantic embeddings. Cosine similarity measures fit between candidate and target role skill profiles.
Equity & Explainability
Top-K candidate lists are reviewed for demographic representation balance. All recommendations are accompanied by interpretable feature rationale: e.g., "High tenure + high move rate in a high-growth job family." A fairness audit view surfaces recommendation parity across demographic groups.
Workforce Growth Forecasting Model

Time series decomposition and trend analysis are applied to historical EDW hiring, termination, and headcount records by department and job family. The model produces headcount forecasts over 1–3 year horizons, turnover trend models segmented by unit, and attrition risk scores per department cluster. BLS occupational growth outlook data (Very High / High / Moderate / Low) is joined to internal job family codes via O*NET crosswalks to identify which internal roles face elevated external competition for talent: a leading indicator of future turnover that doesn't yet appear in internal records.

Feature Engineering

Every model in the platform is built on a shared foundation of derived analytical features computed from raw EDW data. These features transform flat HR records into a rich representation of each employee's career trajectory, mobility behavior, and structural fit within the organization.

Tenure Buckets
0–3 · 3–7 · 7–15 · 15+ years. Total organizational tenure and time in current role are separately bucketed to distinguish early-career mobility from long-tenured stability.
Move Rate per Year
n_internal_moves / tenure_years: a normalized signal of mobility intensity that controls for the fact that a longer-tenured employee has had more time to accumulate moves.
Transition Probability Matrix
For each from_title → to_title pair observed in historical job sequences: transition count and conditional probability. The backbone of Markov simulation and graph analytics.
Job Title Embeddings
SentenceTransformer vector representations of job titles (InstructorXL). Enable semantic similarity between roles for succession matching: e.g., "Research Analyst" and "Policy Analyst" are closer than "Research Analyst" and "Building Engineer."
Career Path Structure Metrics
Fast-track roles (high promotion probability + sufficient volume), dead-end roles (low onward transition diversity), feeder roles (transition to many distinct destinations), and diversity index per role.
Departure Risk Signals
SURS retirement eligibility indicator, proximity to pay band ceiling, department-level churn rate (trailing 12 months), tenure slope (rate of tenure accumulation relative to peers in same role class).
Mobility Cluster Labels
Behaviorally-derived segments: Early-career ladder, Fast-track leader, Stable long-tenure, Cross-department mobile. Used as a categorical feature across all models and as a dashboard filter dimension.
O*NET Growth Outlook
External labor market classification mapped to each internal job family via O*NET occupation code crosswalk: Low / Moderate / High / Very High. Enriches both the promotion model and the attrition risk model with labor market context the EDW alone cannot provide.
Skill Similarity Score
Cosine similarity between the O*NET skill vector of a candidate's current role and a target role. Computed after LLM-assisted skill extraction and O*NET canonical mapping. Input to both the Readiness model and the SRI composite.

Data Sources & EDW Architecture

All data required for this platform exists within the university's existing HR and payroll infrastructure. The Enterprise Data Warehouse (EDW) contains the historical employee, job, and payroll records needed to power every analytical capability described here. No new data collection, vendor relationships, or external surveys are required.

EDW Table Contents Used For
T_PERS_HIST Biographical and demographic data: person ID, gender, age, hire date Person-level joins and identifiers; demographic features for equity audit
V_EMPEE_HIST_1 Current employment view: active employee filter Filter to active employee population; point-in-time snapshots
T_EMPEE_HIST Employment lifecycle: hire date, FTE, job class, current snapshot Tenure computation, FTE features, employment continuity, mobility modeling
T_EMPEE_HIST_5 Termination and SURS retirement details: separation codes, annuitant status Departure risk modeling; retirement eligibility indicator; SURS feature
T_JOB_DETL_HIST Job-level history: title, pay rate, department, supervisor, effective dates Career transition sequences; role similarity; position mapping; pay grade features
T_PAYR_ACCTG_DETL Payroll accounting history: earnings continuity, position-level records Pay band ceiling proximity; earnings trend; tenure proxy via payroll continuity
O*NET Skill Taxonomies

Standardized occupational skill and knowledge attributes. Used to map internal job titles to canonical skill profiles and generate semantic embeddings for succession matching. Provides the external benchmark layer the EDW alone cannot supply.

BLS Workforce Trends

Bureau of Labor Statistics occupational growth outlook classifications. Joined to internal job families via O*NET codes to identify which roles face elevated external talent competition: a leading indicator for future internal attrition risk.

Augmented Cluster Tags

Mobility cluster labels and skill cluster IDs derived from job title NLP and behavioral history. Treated as pre-computed features from a separate embedding pipeline. Used for segmentation, dashboard filtering, and career community detection.

Effective-date logic (*_EFF_DT / *_EXP_DT) across EDW tables enables as-of snapshots at any point in time: essential for backtesting models against historical workforce trends and computing rolling features like tenure slope and department churn rate.

What the Platform Surfaces

Every analytical capability translates into concrete, actionable intelligence delivered through the platform's dashboard and reporting layer. These are not abstract model outputs: they are workforce insights formatted for HR teams and leadership to act on directly.

Succession Gap Heatmap

Vacancy risk score (p_vacancy) plotted against bench depth: how many viable internal candidates exist for each role. Immediately surfaces which high-risk roles have no internal pipeline and require proactive development or hiring planning.

Career Pathway Maps

Sankey-style visualizations showing the top 3 most likely next roles from any starting position, with transition probabilities. Markov simulation extends this to show where employees typically end up after 4+ moves: surfacing structural dead ends and fast-track ladders.

Internal Mobility Health

Department- and job-family-level metrics: average move rate, promotion-vs-lateral ratio, diversity of onward pathways, and identification of roles with unusually low mobility: a signal of stagnation and attrition risk that doesn't appear in turnover data until it's too late.

Candidate Readiness Shortlist

For each target role, a ranked list of internal candidates with their SRI score, skill similarity breakdown, tenure fit, and interpretable rationale. Includes a gap-to-green development plan identifying which specific skill gaps the candidate would need to address.

Department Growth Forecast

1–3 year headcount projections per department and job family, incorporating historical hiring trends, turnover rates, and external labor market outlook. Flags departments with converging retirement risk and low bench depth as highest-priority planning needs.

Labor Market Benchmarking

Internal job families ranked by external growth outlook (O*NET / BLS). Identifies which roles are in high-demand fields where the university competes against the private sector for talent: informing proactive retention and compensation strategy before market pressure materializes in exits.

Career Community Clusters

Network-detected groups of roles that naturally flow into each other: the "IT cluster," "Academic Affairs cluster," etc. Reveals cross-unit mobility opportunities that are currently invisible to HR, and identifies isolated role clusters where employees have limited natural pathways forward.

Equity & Recommendation Parity

An audit view that surfaces whether succession and promotion recommendations are proportionally distributed across demographic groups. Ensures the platform's outputs support equitable talent development practices and flags any model patterns that require fairness review before operational use.

Scenario-Based Intelligence

These illustrative scenarios show what the platform makes possible. Each represents a real workforce challenge that is currently invisible until it becomes an operational problem: and the specific intelligence the platform would surface to allow proactive action instead.

01
Department Chair Retirement: Succession Without a Pipeline
A senior director's SURS indicator crosses retirement eligibility threshold. The departure risk model flags p_vacancy = 0.87 for the role within 12 months. The succession model surfaces three internal candidates with ≥80% skill similarity to the role's O*NET profile: two from within the department and one from a related unit identified via graph community detection. HR initiates early mentorship and structured leadership transition planning eight months before the vacancy occurs, rather than three weeks after the announcement.
02
Dead-End Role Cluster Driving Silent Attrition
The career community detection model identifies a cluster of mid-level administrative roles with near-zero onward transition diversity: employees in these roles rarely move anywhere else in the organization. The promotion likelihood metric shows these roles have a 4% internal promotion rate versus a 31% system average. Cross-referenced with tenure data, the cluster shows long average tenure followed by abrupt departure: a signature of stagnation-driven attrition. HR and leadership now have the evidence to redesign progression pathways and create deliberate mobility bridges before losing another cohort of experienced staff.
03
Cross-Unit Candidate Surface: Internal Hiring Opportunity
A specialist research role opens in one college. Historically, HR would post externally because no obvious candidates are visible within the unit. The succession model's semantic skill matching identifies a strong internal candidate from a different college: a role that shares 74% of the target's O*NET skill profile despite having a different job title. The candidate has a high SRI score driven by strong grade progression and mobility history. The cross-unit transfer is explored before any external search is posted, saving recruiting time and cost while strengthening internal mobility culture.
04
Department Growth Forecast Signals a Hiring Cliff
The workforce growth forecasting model identifies that a service-critical department is trending toward a 23% headcount reduction over 18 months: driven by a retirement concentration wave combined with low recent hiring. External BLS data shows the relevant job family has a "Very High" growth outlook nationally, meaning competition for replacements will be intense. Leadership now has 18 months of lead time to adjust budget planning, accelerate hiring pipelines, and explore whether internal development can fill part of the gap: instead of discovering the shortfall when operations are already affected.

Proof-of-Concept Work Completed

Before the project was approved for full development, three substantial proof-of-concept workstreams were completed to validate the technical approach and demonstrate feasibility on synthetic datasets mirroring EDW structure. These POCs established the full analytical pipeline, modeling architecture, and feature engineering logic that the production platform will implement against real data.

Internal Mobility & Career Path Forecasting
Ruchita Alate · Data Science Analyst
  • Synthetic 500-employee dataset mirroring EDW HR table structure
  • Full feature engineering pipeline: tenure buckets, move rate, title embeddings, transition matrix
  • Random Forest promotion prediction model with balanced class weighting
  • Markov chain career path simulation: simulate_career_path() + simulate_many_paths()
  • Career community detection via NetworkX + greedy modularity
  • Fast-track, dead-end, and feeder role classification
  • Complete dashboard mockup suite: Sankey flows, transition waterfalls, network density
  • Produced: career_transition_matrix.csv, internal_mobility_features_for_dashboard.csv
Succession Readiness Prediction
Tejasri Joshi · Data Science Analyst
  • EDW schema mapping for succession use case (T_PERS_HIST, T_JOB_DETL_HIST, T_EMPEE_HIST_5)
  • Departure risk model design: XGBoost / Logistic Regression on SURS + churn features
  • Candidate readiness model: Gradient Boosting + LambdaMART Learning-to-Rank
  • O*NET skill enrichment pipeline with FAISS embedding index
  • SRI composite formula: w₁·p_success + w₂·SkillSim + w₃·TenureFit + w₄·ExperienceFit
  • Succession dashboard prototype: vacancy heatmap, skill gap views, equity audit
  • Three scenario walkthroughs: retirement succession, training gap, cross-unit mobility
Workforce Growth Forecasting
Tanvi · Data Science Analyst
  • Headcount forecasting by department and job family
  • Hiring demand prediction models
  • Turnover trend analysis and seasonality decomposition
  • Department growth and stability clustering
  • Integration of BLS / O*NET external growth outlook data
  • Attrition risk scoring at the unit level
  • Dashboard visualization prototypes for Power BI / Tableau integration
The POC phase was not exploratory: it was deliberate validation. Every model architecture, feature engineering decision, EDW table mapping, and dashboard design in the production platform is grounded in work that was prototyped, tested, and evaluated prior to seeking full production build approval.

Build Roadmap: Pending Resource Approval

The platform build is structured across four sequential phases spanning approximately six months with a team of one lead data scientist and three intern analysts (~2,460 total hours). The project is currently under review and the build is expected to begin once resources are confirmed.

Phase Name Duration Focus
Phase 1 Data Preparation & Quality Analysis Weeks 1–6 · ~500 hrs EDW extraction, data cleaning, employee transition dataset construction, feature engineering, O*NET/BLS external join
Phase 2 Workforce Analytics Model Development Weeks 6–14 · ~800 hrs Promotion model, transition matrix + Markov simulation, graph analytics, succession models, growth forecasting: all three tracks in parallel
Phase 3 Model Validation & Output Generation Weeks 14–18 · ~400 hrs Validation against historical trends, parameter refinement, fairness review across demographic groups, structured output generation for dashboard layer
Phase 4 Workforce Intelligence Platform Development Weeks 18–26 · ~760 hrs Platform architecture, Power BI / Tableau dashboard build, career pathway maps, mobility network views, HR reporting tools, leadership user testing

Long-Term Vision & Strategic Fit

The Workforce Analytics Platform is not a standalone project. By standardizing HR data, enriching it with external occupational frameworks, and generating structured semantic metadata, it establishes a reusable workforce data infrastructure that supports a broader institutional AI and analytics strategy well beyond the scope of this initial build.

Future Capability How This Platform Enables It
ERP Modernization Readiness Standardized workforce data, job taxonomy normalization, and enriched metadata reduce migration risk and improve data consistency for future ERP upgrades: a direct institutional priority.
AI-Driven Talent Acquisition Semantic role embeddings and career mobility models enable intelligent internal candidate identification and job matching before external searches are opened.
Skill Gap Analysis Mapping internal job roles to O*NET/BLS frameworks enables workforce capability gap modeling: identifying where the university's skill profile is diverging from future demand.
LLM-Powered HR Assistant Clean workforce datasets, structured metadata, and semantic embeddings provide the retrieval and grounding layer required for conversational AI tools (e.g., Denodo AI interface).
University-Wide Analytics Platform The platform architecture establishes a reusable semantic data layer that can extend beyond HR: to finance, research operations, and enrollment analytics.
← Back to Portfolio