A predictive workforce analytics initiative designed to transform raw HR and payroll data from the university's Enterprise Data Warehouse into forward-looking strategic intelligence: surfacing succession gaps, internal career pathways, attrition risk, and workforce growth trends before they become problems.
Workforce planning at most universities: including the University of Illinois System: is largely reactive. Leadership learns about staffing challenges after they have already materialized: critical roles become vacant with no internal successors ready, departments lose years of institutional knowledge through unexpected attrition, and hiring decisions are made without visibility into internal mobility patterns or future demand.
This initiative is designed to change that. Developed in partnership with Luanne Mayorga, Assistant Chancellor for Illinois HR, the HR Workforce Analytics Intelligence Platform applies predictive modeling, career path simulation, and graph analytics to the university's existing HR and payroll data: no new data collection required: to generate forward-looking intelligence that HR teams and leadership can act on before gaps become crises.
The project has already completed three substantial proof-of-concept workstreams covering internal mobility modeling, succession readiness prediction, and workforce growth forecasting. The platform architecture, modeling approach, and feature engineering pipeline are fully designed. The full project build is currently under review with the goal of beginning production development soon.
All POC work was built on a synthetic dataset mirroring the structure of EDW HR tables: demonstrating the full analytical pipeline before connecting to production data, which will occur in Phase 1 under appropriate data governance review.
The university cannot currently answer several fundamental workforce questions in real time: Which departments will face capacity gaps in the next 12 months? Which roles have no internal succession pipeline? Where is turnover signaling instability versus normal movement? Which employees are quietly at flight risk? Without predictive analytics, these questions surface only after the damage has been done.
Most HR reporting is descriptive: it tells you what happened. This platform is generative and predictive: it simulates where the workforce is going, who is ready to step up, which career pathways are silently blocked, and which departments are structurally understaffed before those conditions become visible in turnover reports.
Every insight the platform generates is derived from data the university already collects: HR records, payroll history, job titles, and tenure data within the Enterprise Data Warehouse. This is an analytics problem, not a data collection problem.
The most important value this platform creates is not answering questions that are already being asked: it's surfacing insights no one knew to look for. The shift from reactive to proactive workforce intelligence means giving leadership and HR the visibility to act on trends months before those trends become operational problems.
The platform delivers three interconnected analytical capabilities, each targeting a different dimension of workforce intelligence. Together they provide a comprehensive, forward-looking view of how the workforce is structured, how it moves, and where it is headed.
Analyzes historical job transition data to construct a comprehensive map of how employees actually move through the organization: which roles feed into which, which pathways accelerate careers, and which are structural dead ends that stall progression and quietly drive attrition.
The core of this capability is a career transition matrix built from each employee's historical job title sequence: a probabilistic map of the most likely next role from any starting position. On top of that matrix, a Markov chain simulation engine projects career journeys forward, generating distributions of where employees in a given role tend to land after N moves. This gives HR a generative view of internal mobility: not just what happened historically, but what will likely happen next.
Graph analytics are then applied to the transition network to identify natural career communities: clusters of roles that tend to flow into each other: and to compute PageRank-style centrality scores that surface which roles are hubs of career movement versus isolated dead ends.
Identifies which internal employees are most ready to step into leadership or specialist roles before those positions become vacant: and flags roles across the organization that have no viable internal succession pipeline at all.
Two sub-models drive this capability. The first estimates departure risk: the probability that a role becomes vacant in the next 6–12 months: using features like tenure slope, proximity to pay band ceiling, department churn rates, and retirement eligibility (SURS) indicators. The second estimates candidate readiness, predicting how well a given employee fits a target role based on skill similarity, grade progression, and mobility history.
These two scores combine into a Succession Readiness Index (SRI): a composite score per (candidate, role) pair that ranks internal candidates and provides interpretable rationale for each recommendation.
Projects headcount, hiring demand, and turnover patterns by department and job family: giving leadership a data-driven view of where the workforce is contracting, growing, and becoming structurally unstable over a 1–3 year horizon.
Time series and trend analysis methods are applied to historical EDW data to model each department's growth trajectory, turnover seasonality, and net headcount change. Clustering algorithms group departments by workforce behavior patterns: distinguishing stable and growing units from those showing early signs of instability. External labor market data from O*NET and BLS is incorporated to benchmark internal roles against occupational growth outlooks, identifying job families at elevated risk from market competition.
The platform uses a multi-model architecture: combining supervised machine learning, generative simulation, graph analytics, and NLP-driven skill enrichment to provide complementary views of workforce dynamics. No single model captures the full picture; the insight emerges from how these layers interact.
Time series decomposition and trend analysis are applied to historical EDW hiring, termination, and headcount records by department and job family. The model produces headcount forecasts over 1–3 year horizons, turnover trend models segmented by unit, and attrition risk scores per department cluster. BLS occupational growth outlook data (Very High / High / Moderate / Low) is joined to internal job family codes via O*NET crosswalks to identify which internal roles face elevated external competition for talent: a leading indicator of future turnover that doesn't yet appear in internal records.
Every model in the platform is built on a shared foundation of derived analytical features computed from raw EDW data. These features transform flat HR records into a rich representation of each employee's career trajectory, mobility behavior, and structural fit within the organization.
All data required for this platform exists within the university's existing HR and payroll infrastructure. The Enterprise Data Warehouse (EDW) contains the historical employee, job, and payroll records needed to power every analytical capability described here. No new data collection, vendor relationships, or external surveys are required.
| EDW Table | Contents | Used For |
|---|---|---|
| T_PERS_HIST | Biographical and demographic data: person ID, gender, age, hire date | Person-level joins and identifiers; demographic features for equity audit |
| V_EMPEE_HIST_1 | Current employment view: active employee filter | Filter to active employee population; point-in-time snapshots |
| T_EMPEE_HIST | Employment lifecycle: hire date, FTE, job class, current snapshot | Tenure computation, FTE features, employment continuity, mobility modeling |
| T_EMPEE_HIST_5 | Termination and SURS retirement details: separation codes, annuitant status | Departure risk modeling; retirement eligibility indicator; SURS feature |
| T_JOB_DETL_HIST | Job-level history: title, pay rate, department, supervisor, effective dates | Career transition sequences; role similarity; position mapping; pay grade features |
| T_PAYR_ACCTG_DETL | Payroll accounting history: earnings continuity, position-level records | Pay band ceiling proximity; earnings trend; tenure proxy via payroll continuity |
Standardized occupational skill and knowledge attributes. Used to map internal job titles to canonical skill profiles and generate semantic embeddings for succession matching. Provides the external benchmark layer the EDW alone cannot supply.
Bureau of Labor Statistics occupational growth outlook classifications. Joined to internal job families via O*NET codes to identify which roles face elevated external talent competition: a leading indicator for future internal attrition risk.
Mobility cluster labels and skill cluster IDs derived from job title NLP and behavioral history. Treated as pre-computed features from a separate embedding pipeline. Used for segmentation, dashboard filtering, and career community detection.
*_EFF_DT / *_EXP_DT) across EDW tables enables as-of snapshots at any point in time: essential for backtesting models against historical workforce trends and computing rolling features like tenure slope and department churn rate.
Every analytical capability translates into concrete, actionable intelligence delivered through the platform's dashboard and reporting layer. These are not abstract model outputs: they are workforce insights formatted for HR teams and leadership to act on directly.
Vacancy risk score (p_vacancy) plotted against bench depth: how many viable internal candidates exist for each role. Immediately surfaces which high-risk roles have no internal pipeline and require proactive development or hiring planning.
Sankey-style visualizations showing the top 3 most likely next roles from any starting position, with transition probabilities. Markov simulation extends this to show where employees typically end up after 4+ moves: surfacing structural dead ends and fast-track ladders.
Department- and job-family-level metrics: average move rate, promotion-vs-lateral ratio, diversity of onward pathways, and identification of roles with unusually low mobility: a signal of stagnation and attrition risk that doesn't appear in turnover data until it's too late.
For each target role, a ranked list of internal candidates with their SRI score, skill similarity breakdown, tenure fit, and interpretable rationale. Includes a gap-to-green development plan identifying which specific skill gaps the candidate would need to address.
1–3 year headcount projections per department and job family, incorporating historical hiring trends, turnover rates, and external labor market outlook. Flags departments with converging retirement risk and low bench depth as highest-priority planning needs.
Internal job families ranked by external growth outlook (O*NET / BLS). Identifies which roles are in high-demand fields where the university competes against the private sector for talent: informing proactive retention and compensation strategy before market pressure materializes in exits.
Network-detected groups of roles that naturally flow into each other: the "IT cluster," "Academic Affairs cluster," etc. Reveals cross-unit mobility opportunities that are currently invisible to HR, and identifies isolated role clusters where employees have limited natural pathways forward.
An audit view that surfaces whether succession and promotion recommendations are proportionally distributed across demographic groups. Ensures the platform's outputs support equitable talent development practices and flags any model patterns that require fairness review before operational use.
These illustrative scenarios show what the platform makes possible. Each represents a real workforce challenge that is currently invisible until it becomes an operational problem: and the specific intelligence the platform would surface to allow proactive action instead.
Before the project was approved for full development, three substantial proof-of-concept workstreams were completed to validate the technical approach and demonstrate feasibility on synthetic datasets mirroring EDW structure. These POCs established the full analytical pipeline, modeling architecture, and feature engineering logic that the production platform will implement against real data.
The platform build is structured across four sequential phases spanning approximately six months with a team of one lead data scientist and three intern analysts (~2,460 total hours). The project is currently under review and the build is expected to begin once resources are confirmed.
| Phase | Name | Duration | Focus |
|---|---|---|---|
| Phase 1 | Data Preparation & Quality Analysis | Weeks 1–6 · ~500 hrs | EDW extraction, data cleaning, employee transition dataset construction, feature engineering, O*NET/BLS external join |
| Phase 2 | Workforce Analytics Model Development | Weeks 6–14 · ~800 hrs | Promotion model, transition matrix + Markov simulation, graph analytics, succession models, growth forecasting: all three tracks in parallel |
| Phase 3 | Model Validation & Output Generation | Weeks 14–18 · ~400 hrs | Validation against historical trends, parameter refinement, fairness review across demographic groups, structured output generation for dashboard layer |
| Phase 4 | Workforce Intelligence Platform Development | Weeks 18–26 · ~760 hrs | Platform architecture, Power BI / Tableau dashboard build, career pathway maps, mobility network views, HR reporting tools, leadership user testing |
The Workforce Analytics Platform is not a standalone project. By standardizing HR data, enriching it with external occupational frameworks, and generating structured semantic metadata, it establishes a reusable workforce data infrastructure that supports a broader institutional AI and analytics strategy well beyond the scope of this initial build.
| Future Capability | How This Platform Enables It |
|---|---|
| ERP Modernization Readiness | Standardized workforce data, job taxonomy normalization, and enriched metadata reduce migration risk and improve data consistency for future ERP upgrades: a direct institutional priority. |
| AI-Driven Talent Acquisition | Semantic role embeddings and career mobility models enable intelligent internal candidate identification and job matching before external searches are opened. |
| Skill Gap Analysis | Mapping internal job roles to O*NET/BLS frameworks enables workforce capability gap modeling: identifying where the university's skill profile is diverging from future demand. |
| LLM-Powered HR Assistant | Clean workforce datasets, structured metadata, and semantic embeddings provide the retrieval and grounding layer required for conversational AI tools (e.g., Denodo AI interface). |
| University-Wide Analytics Platform | The platform architecture establishes a reusable semantic data layer that can extend beyond HR: to finance, research operations, and enrollment analytics. |