Fairness-aware data preprocessing for classification tasks

González Zelaya, Carlos Vladimiro

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/5823

Title:	Fairness-aware data preprocessing for classification tasks
Authors:	González Zelaya, Carlos Vladimiro
Issue Date:	2022
Publisher:	Newcastle University
Abstract:	The prevalence of decision-making mechanisms in life-impacting decisions, ranging from bank loans and college admissions to probation decisions, makes understanding and controlling the fairness of algorithmically-generated decisions indispensable. This thesis presents an introduction to algorithmic fairness, focusing on classification tasks. A survey of state-of-the-art fairness-correcting methods is presented, emphasising data preprocessing solutions. The thesis’ research aim is to design, implement and evaluate data preprocessing methods that correct unfair predictions in classification tasks. Three such methods are presented, sharing the trait of being fairness-definition and classifier agnostic. For each of these methods, experiments are performed on widely used benchmark datasets. FAIRPIPES is a genetic-algorithm method which optimises for user-defined combina- tions of multiple definitions of fairness and accuracy, providing flexibility in the fairness- accuracy trade-off. FAIRPIPES heuristically searches through a large space of pipeline configurations, achieving near-optimality efficiently and presenting the user with an esti- mate of the best attainable fairness/accuracy trade-offs. The optimal pipelines are shown to differ for different datasets, suggesting that no “universal best” pipeline exists and confirming that FAIRPIPES fills a niche in the fairness-aware AutoML space. PARDS is a parametrised data sampling method by which it is possible to optimise the fairness ratios observed on a dataset, in a way that is agnostic to both the specific fairness definitions, and the chosen classification model. Given a dataset with one binary protected attribute and a binary label, PARDS’ approach involves correcting the positive rate for both the favoured and unfavoured groups through resampling of the training set. PARDS is shown to produce fairness-optimal predictions with a small loss in predictive power. FAIR-MDAV is a fairness correcting preprocessing method with privacy guarantees. It outperforms existing fairness correcting methods on its equalised odds/accuracy trade-off, and is competitive on its demographic parity/accuracy trade-off as well. FAIR-MDAV is modular, allowing for privacy guarantees to be set separately from fairness correction.
Description:	PhD Thesis
URI:	http://hdl.handle.net/10443/5823
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
Gonzalez Zelaya C V 2022.pdf		4.69 MB	Adobe PDF	View/Open
dspacelicence.pdf		43.82 kB	Adobe PDF	View/Open

Show full item record