Do LLMs Model Spoken Dialogue Accurately?
Transformer-based Large Language Models (LLMs) have shown impressive performance in various language tasks, but their ability to predict spoken language in natural interactions remains uncertain. Spoken and written languages differ significantly in syntax, pragmatics, and norms. While LLMs understand linguistic rules statistically, they may not grasp the normative structure of interactive spoken language. This study evaluates LLMs, specifically GPT-2 variants, on spoken dialogue. Fine-tuning on English dialogue transcripts, we assessed whether LLMs consider speaker identity in predictions. Although models used speaker identity, they occasionally hallucinated transitions for plausibility. Thus, LLMs generate norm-conforming text but do not yet replicate human conversational behavior accurately.