GailBot

Researchers rely on detailed transcriptions of language, including paralinguistic features like overlaps, prosody, and intonation, which are time-consuming to create manually. No current Speech to Text (STT) systems integrate these features. GailBot, a new program, combines STT services with plugins to generate transcripts that follow Conversation Analysis standards. It allows adding and improving plugins for additional features. GailBot's architecture, heuristics, and machine learning are described, showing its improvement over existing transcription software.

Project Dates: 8/1/18 - Present

Date Published: 6/23/24

Overview

GailBot, developed by the Human Interaction Lab at Tufts University, advances dialogue systems for human-robot interaction. It features algorithms for seamless turn construction, silences classification based on duration, detection of overlapping speech using timing and speaker identity, estimation of syllable rate, and neural network-based laughter detection. These innovations aim to enhance naturalistic interactions, contributing to fields like artificial intelligence and machine learning. Explore our Human Interaction Lab for research details and visit the project package page. Note that the code repository for this project is private, but we can share more details upon request.

Please fill out this form to request access to GailBot and join our slack community.

Introduction

Researchers in fields like conversation analysis, psychology, and linguistics require detailed transcriptions of language use, including paralinguistic features such as overlaps, prosody, and intonation. Manual transcription is time-consuming and there are no existing Speech to Text (STT) systems that integrate these features. GailBot, a program developed to address this, combines STT services with plugins to automatically generate first drafts of transcripts adhering to Conversation Analysis standards. GailBot also allows researchers to add new plugins or improve existing ones. The paper describes GailBot’s architecture, use of computational heuristics, machine learning, and evaluates its output compared to human transcribers and other automated systems. Despite limitations, GailBot represents a significant improvement in dialogue transcription software.

Architecture and Algorithms

Architecture

GailBot’s architecture includes an API with two main components: an organizer and a core pipeline. The organizer handles media files and previous GailBot outputs, creating conversation objects with settings profiles. The core pipeline processes these objects in sequential steps: transcription, analysis, and formatting into CA transcripts.

Plugins

Plugins are customizable algorithms that identify specific paralinguistic features. They interact with the core pipeline and can be configured and extended by users.

Algorithms

Algorithm Name Description
Turn Construction Identifies turn transitions using speaker changes.
Silences Classifies silences as micropause, pause, or gap based on duration.
Overlapping Speech Detects overlaps using speaker identity and utterance timing.
Syllable Rate Estimates syllable rate and identifies fast or slow speech using median absolute deviation.
Laughter Detection Uses a neural network to identify laughter segments.

Performance

GailBot’s performance was evaluated on various corpora with different audio qualities and compared to human transcriptions and other automated systems. Key metrics included Word Error Rate (WER), overlap and silence accuracy, and transcription time savings. GailBot was found to significantly reduce transcription time and produce useful first draft CA transcripts, although it has limitations in speaker diarization and identifying exact overlap markers and silence durations.

Discussion

GailBot provides an extensible framework for transcribing paralinguistic features and generating draft CA transcripts. Future improvements may include data-driven models and addressing biases in ASR systems. GailBot facilitates the creation of large-scale CA corpora, enabling new research opportunities across social and computational sciences.

Acknowledgements

The development of GailBot was supported by AFOSR grant FA9550-18-1-0465, the School of Arts & Sciences, and the School of Engineering at Tufts University.

I extend my deepest gratitude to our main collaborators for their integral contributions to GailBot’s development. As the lead developer at Tufts University, I focused on enhancing naturalistic turn-taking in dialogue systems. Dr. Julia Mertens’ insights into communication dynamics, Dr. Saul Albert’s pioneering work in conversation analysis and AI, and J.P. De Ruiter’s guidance as Principal Investigator have been instrumental. Their expertise has shaped GailBot into a versatile research tool.

Undergraduate Contributors

I extend my heartfelt gratitude to all the undergraduate contributors who have been instrumental in the development of GailBot. Your dedication, creativity, and hard work have significantly enriched our project, from refining the application’s functionalities to implementing cutting-edge features. Each of you has played a crucial role in advancing our research goals, and your contributions will undoubtedly leave a lasting impact on the field of human-robot interaction and dialogue systems. Thank you for your commitment and invaluable efforts throughout this journey.

Student Name Dates
Vivian (Yike) Li Fall 2022, Spring 2023, Spring 2024
Hannah Shader Fall 2023, Spring 2024, Summer 2024
Sophie Clemens Summer 2024
Daniel Bergen Summer 2024
Eva Caro Summer 2024
Marti Zentmaier Summer 2024
Erin Sarlak Spring 2024
Joanne Fan Spring 2024
Riddhi Sahni Spring 2024
Lakshita Jain Fall 2023
Anya Bhatia Fall 2023
Jason Wu Summer 2023
Jacob Boyar Summer 2023
Siara Small Fall 2022, Spring 2023
Annika Tanner Spring 2022
Muyin Yao Spring 2022
Rosanna Vitiello Spring 2021
Eva Denman Spring 2021