To address the growing concern of unfairness in Artificial Intelligence (AI), several bias mitigation algorithms have been introduced in prior research. Their capabilities are often evaluated on certain overly-used datasets without rigorously stress-testing them under simultaneous train and test distribution shifts. To address this, we investigate the fairness vulnerabilities of these algorithms across several distribution shift scenarios using synthetic data, to highlight scenarios where these algorithms do and don't work to encourage their trustworthy use. The paper makes three important contributions. Firstly, we propose a flexible pipeline called the Fairness Auditor to systematically stress-test bias mitigation algorithms using multiple synthetic datasets with shifts. Secondly, we introduce the Deviation Metric for measuring the fairness and utility performance of these algorithms under such shifts. Thirdly, we propose an interactive reporting tool for comparing algorithmic performance across various synthetic datasets, mitigation algorithms and metrics called the Fairness Report.