Date: 7th of June, 2021
Time: 16:00 (CET)
Title: Uncovering Unknown Unknowns in Machine Learning
Abstract: The performance of machine learning (ML) models depends both on the learning algorithms, as well as the data used for training and evaluation. The role of the algorithms is well studied and the focus of a multitude of challenges, such as SQuAD, GLUE, ImageNet, and many others. In addition, there have been efforts to also improve the data, including a series of workshops addressing issues for ML evaluation. In contrast, research and challenges that focus on the data used for evaluation of ML models are not commonplace. Furthermore, many evaluation datasets contain items that are easy to evaluate, e.g., photos with a subject that is easy to identify, and thus they miss the natural ambiguity of real world context. The absence of ambiguous real-world examples in evaluation undermines the ability to reliably test machine learning performance, which makes ML models prone to develop “weak spots”, i.e., classes of examples that are difficult or impossible for a model to accurately evaluate, because that class of examples is missing from the evaluation set. To address the problem of identifying these weaknesses in ML models, we recently launched the Crowdsourcing Adverse Test Sets for Machine Learning (CATS4ML) Data Challenge at HCOMP 2020 open to researchers and developers worldwide. The goal of the challenge is to raise the bar in ML evaluation sets and to find as many examples as possible that are confusing or otherwise problematic for algorithms to process. CATS4ML relies on people’s abilities and intuition to spot new data examples about which machine learning is confident, but actually misclassified. This first edition of the CATS4ML Data Challenge focuses on visual recognition, using images and labels from the Open Images Dataset. The target images for the challenge are selected from the Open Images Dataset along with a set of 24 target labels from the same dataset. The challenge participants are invited to invent new and creative ways to explore this existing publicly available dataset and, focussed on a list of pre-selected target labels, discover examples of unknown unknowns for ML models. For more details read the blog post: https://ai.googleblog.com/2021/02/uncovering-unknown-unknowns-in-machine.html
Speaker Biography: Lora Aroyo is a Research Scientist at Google, NY currently working on human-labeled data quality . She is best known for her work on CrowdTruth crowdsourcing methodology. Throughout her career, Lora was a principal investigator of a large number of research projects bringing together methods and tools from human computation, linked (open) data, data science & human-computer interaction with the goal of building hybrid human-AI systems for understanding text, images, and videos with humans-in-the-loop. Her research projects focussing on personalized access to online multimedia have a major impact and established her as a recognized leader in human computation techniques for digital humanities, cultural heritage, and interactive TV. Prior to joining Google, she worked at the VU University Amsterdam as Full Professor in Computer Science and was Chief Scientist at NY-based startup Tagasauris. She is a four times holder of IBM Faculty Award for her work on CrowdTruth used in adapting the IBM Watson system to the medical domain and in capturing ambiguity in understanding misinformation. She is currently president of the User Modeling Inc, which acts as a steering committee for the UMAP conference series.