Some Biases in Data Science
Data Scientists are beset by potential biases, which lead to skewed analysis results and predictions. Some of these biases, such as biased data and algorithm bias, have received extensive coverage in the press and popular science books. Biases that are more specific, such as selection bias, metric bias, and reporting bias, are less widely reported. Lesser-known biases, such as pool bias, only affect sub-areas of Data Science. This talk gives a rapid overview of these biases, including examples of their occurrence and their effect. The influence of the combination of these biases on the reliability of published results is discussed, along with various paths toward improved awareness and reduction of bias.
Allan Hanbury is Professor for Data Intelligence at the TU Wien and faculty member of the Complexity Science Hub Vienna. He was scientific coordinator of the EU-funded Khresmoi Integrated Project on medical and health information search and analysis, and is co-founder of contextflow, the spin-off company commercialising the radiology image search technology developed in the Khresmoi project. He does research in the general area of Data Science, with a focus on Information Retrieval and Evaluation.