Automatically modeling conversations as processes of interrelated speech Intentions

Abstract : The proliferation of digital data has enabled scientific and practitioner communities to createnew data-driven technologies to learn about user behaviors in order to deliver better services and support to people in their digital experience. The majority of these technologies extensively derive value from data logs passively generated during the human-computer interaction. A particularity of these behavioral traces is that they are structured. However, the pro-actively generated text across Internet is highly unstructured and represents the overwhelming majority of behavioral traces. To date, despite its prevalence and the relevance of behavioral knowledge to many domains, such as recommender systems, cyber-security and social network analysis,the digital text is still insufficiently tackled as traces of human behavior to automatically reveal extensive insights into behavior.The main objective of this thesis is to propose a corpus-independent method to automatically exploit the asynchronous communication as pro-actively generated behavior traces in order to discover process models of conversations, centered on comprehensive speech intentions and relations. The solution is built in three iterations, following a design science approach.Multiple original contributions are made. The only systematic study to date on the automatic modeling of asynchronous communication with speech intentions is conducted. A speech intention taxonomy is derived from linguistics to model the asynchronous communication and, comparedto all taxonomies from the related works, it is corpus-independent, comprehensive—as in both finer-grained and exhaustive in the given context, and its application by non-experts is proven feasible through extensive experiments. A corpus-independent, automatic method to annotate utterances of asynchronous communication with the proposed speech intention taxonomy is designed based on supervised machine learning. For this, validated ground-truth corpora arecreated and groups of features—discourse, content and conversation-related, are engineered to be used by the classifiers. In particular, some of the discourse features are novel and defined by considering linguistic means to express speech intentions, without relying on the corpus explicit content, domain or on specificities of the asynchronous communication types. Then, an automatic method based on process mining is designed to generate process models of interrelated speech intentions from conversation turns, annotated with multiple speech intentions per sentence. As process mining relies on well-defined structured event logs, an algorithm to produce such logs from conversations is proposed. Additionally, an extensive design rationale on how conversations annotated with multiple labels per sentence could be transformed in event logs and what is the impact of different decisions on the output behavioral models is released to support future research. Experiments and qualitative validations in medicine and conversation analysis show that the proposed solution reveals reliable and relevant results, but also limitations are identified,to be addressed in future works.
Document type :
Theses
Complete list of metadatas

Cited literature [241 references]  Display  Hide  Download

https://hal-paris1.archives-ouvertes.fr/tel-02021609
Contributor : Abes Star <>
Submitted on : Monday, October 21, 2019 - 9:55:10 AM
Last modification on : Tuesday, October 22, 2019 - 1:27:18 AM

File

EPURE.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02021609, version 2

Collections

Citation

Elena Epure. Automatically modeling conversations as processes of interrelated speech Intentions. Computation and Language [cs.CL]. Université Panthéon-Sorbonne - Paris I, 2018. English. ⟨NNT : 2018PA01E068⟩. ⟨tel-02021609v2⟩

Share

Metrics

Record views

2