Date: 1 Feb 2021
Time: 16:00 (CET)
Title: Everyone wants to do the model work, not the data work: Human-Data Interaction in AI
Abstract: AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact to living beings. Paradoxically, data is the most under-valued and de-glamorized aspect of AI. In this paper, we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We define and report on Data Cascades compounding events causing negative, downstream effects from data issues triggered by conventional AI/ML practices that undervalue data quality. Data cascades are pervasive (92% prevalence), invisible, delayed, but often avoidable. Data cascades demonstrate how broken incentives for data quality impact vulnerable groups tigers alive or dead, cancer diagnosed or not. We also discuss findings on data collectors and raters, paying attention to their values, processes, and tools. We discuss opportunities for HCI to design and incentivize data excellence, moving from reactive to proactive focus on data work and workers in AI resulting in safer and robust systems for all.
Speaker Biography: Nithya Sambasivan is a Staff Researcher at PAIR and leads the HCI-AI group at Google Research India, Bangalore. Nithya's current research focuses on using HCI techniques in developing responsible AI in India, with a focus on marginalized communities. Specific sub-areas are data, fairness, privacy and abuse, and consent. She publishes in the fields of HCI, ICTD, and Privacy/Security. Nithya's long-standing research agenda has been on HCI and under-represented communities in the Global South. She has a PhD. in Informatics from UC Irvine and a Master's in HCI from Georgia Tech.