3. With regard to experiences with annotation – to what has worked well and not so well – it was pointed out that the effort of developing datasets annotated for arguments has often been underestimated, even though it constitutes a very important task. Surely, annotation is time-consuming since it needs training and “error-prone” since the task is inherently subjective. A problem during annotating data can be the disagreement among annotators when defining what arguments are in text, therefore annotation frequently requires expert intervention and discussion with the respective annotators. When annotating specialized texts, experts need firstly to understand the terms of the argumentation theory they are applying.
Kickoff Conference SPP RATIO: 23-25 April 2018
Results
The kick-off conference of the DFG-funded priority program “Robust Argumentation Machines” took place between April 23rd and 25th at the Center for Interdisciplinary Studies (ZiF) at Bielefeld. Thirteen projects funded under the program presented their research program. The program of the kick-off conference featured two invited talks by Serena Villata and Chris Reed.
At the end of the second, day, four well-known experts in the field of argumentation exchanged their views on open problems in the field of argumentation in the form of an expert panel: Chris Reed, Serena Villata, Manfred Stede, Iryna Gurevych. We provide a brief snapshots of the main views that were expressed in reaction to four questions:
1. Concerning unresolved questions in the field of argumentation, the participating experts identified several unsettled issues, which are mentioned in the next paragraphs. Chris Reed mentioned that we need a more precise way of judging arguments and their impact. Studying the ethos of arguments and their influence on an argumentation process are crucial issues. Other panelists mentioned that there is need to work on methods which can judge the consistency of argumentation structure, identify fallacies in argumentation and rank arguments according to their (semantic) strength. Serena Villata mentioned that paying attention to information dynamics and to the evolution of arguments is crucial. Iryna Gurevych mentioned that we need more and better training datasets so that state-of-the-art machine learning methods can be trained to identify and reason upon arguments. Finally, several panelists highlighted that we need to figure out how to represent world knowledge and create methods for deep NLP (i.e. deep language and background understanding), whilst considering the domain where reasoning takes place. Furthermore, with regard to context, we need to explore what happens under the linguistic surface in order to provide suitable frameworks for the underlying structures and concepts based on specific tasks.
2. Concerning the question as to how to achieve a certain level or scalability of robustness1 in argumentation machines, some experts agree that obtaining robustness in the academic world is possible by investing in infrastructure and through consistency and commitment. Outside of academia, there are not many examples of scalable argumentation systems. Robust annotation schemes that can be applied consistently by different annotators are needed since there is no clear and inherent definition of what an argument is (e.g. arguments can be partial and involve uncertainty). Some panelists pointed out that scalability issues in argument mining do not differ qualitatively from scalability issues in other NLP tasks. Some panelists believe that argumentation is too complex (e.g. variability of arguments) for only one commonly agreed upon model. It has been argued that we would need either to lower our standards, or to accept that we do not get far with robustness.
Crowdsourcing and Gaming could be useful paradigms to speed up the annotation process. However, the latter would imply an extensive infrastructure development. An advice is to reuse existing datasets as much as possible and share them within the community. The available data can also be used for annotating new data by bootstrapping the process with machine learning.
4. Regarding what would be the shared tasks for the SPP, it was observed that the agreement on a set of shared tasks might imply a certain risk of reduced diversity and creativity. There is also the risk of unduly focusing on tasks that can distract from the main argumentation task. It was proposed to define a common dataset to work on but not performing the same task on it, in order to have diversity of tasks and at the same time a common base for discussion. It was concluded that the shared tasks have to be not too specific, but at the same time clear enough to be reusable by different groups.
1Robustness in the sense that a system is robust for the original purpose it was created for.