Bias risk identification across the lifecycle
Structured identification of where bias enters — historical bias in the training data, sampling bias in the collection, labelling bias in the annotation, measurement bias in the features, aggregation bias in the modelling target, deployment bias in the population the model is applied to, and feedback-loop bias in the data the system itself generates over time. Each entry point has a different mitigation surface, and Saiph engagements identify them per use case rather than treating bias as a single phenomenon.
Dataset audits and representational analysis
Pre-training analysis of training and validation datasets — sub-group representation, feature distribution per group, label distribution per group, coverage gaps, intersectional sparsity, and the proxy attributes that carry sensitive information even when the protected attribute is absent. Datasets that fail this audit are remediated, augmented, re-sampled, or rejected before they reach training — not corrected post-hoc through the harder mechanism of model surgery.
Fairness metrics — selected explicitly per use case
Demographic parity, equalised odds, equal opportunity, predictive parity, calibration within groups, individual fairness via similarity, counterfactual fairness, and the disparate-impact ratios required by sectoral regulation. Saiph engagements select the metrics that match the use case — different metrics for credit, employment, healthcare, criminal-justice analogues, and customer-facing personalisation — and document the trade-offs the choice implies. Where metrics conflict, the conflict is surfaced rather than averaged into a single composite that hides it.
Mitigation across pre-processing, in-processing, and post-processing
Mitigation engineered at the right point in the lifecycle. Pre-processing — re-sampling, re-weighting, feature transformations, synthetic augmentation through Xafe-generated balanced datasets, and proxy-attribute analysis. In-processing — fairness-constrained optimisation, adversarial debiasing, regularisation against group disparity. Post-processing — threshold calibration per group, reject-option classification, and equalised-odds post-processing. The chosen point depends on the data, the model, the use case, and the regulatory constraints — not on the tool that happens to be already installed.
Generative-AI and agentic-system fairness
Fairness evaluation for the generative and agentic surfaces, where the failure modes differ from classical ML — representational harms in generated content, stereotype reinforcement, allocation effects in agent decisions, demographic skew in retrieval, and the differential reliability of LLM outputs across languages, dialects, and cultural contexts. Evaluation harnesses include red-team prompt suites, sub-group benchmark performance, retrieval-fairness analysis, and the output-level audit surfaces enterprise GenAI deployments require under the EU AI Act and customer risk frameworks.
Continuous fairness monitoring in production
Production instrumentation that measures fairness metrics on live decisions — sub-group performance, disparate-impact ratios, threshold drift, feedback-loop disparity, and the early-warning signals that bias regression has begun. Alerts and incident workflows are wired into the same operating model as the rest of model monitoring, with named owners, severity ladders, and remediation runbooks. The one-off pre-release audit, never repeated, is the failure pattern this discipline is engineered to replace.