Argumentation Mining Literature on Scientific Discourse

Compiled by Tirthankar Ghosal, and included in 'Argument Mining for Scholarly Document Processing: Taking Stock and Looking Ahead.'
Reference Domain Objectives Methods Additional Contribution
Manual Argument Analysis
Nancy L Green. 2015b. Annotating evidence-based argumentation in biomedical text. In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 922–929. IEEE. Biomedical articles Analyzed evidence based arguments in four full-text articles on genetic variants that may cause human health problems and created a preliminary catalog of argumentation schemes    
Nancy Green. 2017a. Manual identification of arguments with implicit conclusions using semantic rules for argument mining. In Proceedings of the 4th Workshop on Argument Mining, pages 73–78. Biomedical articles Evaluate human analysts’ ability to identify the argumentation scheme and premises of an argument having an implicit conclusion    
Nancy Green. 2018a. Proposed method for annotation of scientific arguments in terms of semantic relations and argument schemes. In Proceedings of the 5th Workshop on Argument Mining, pages 105–110. Biomedical Genetics articles Provide a method for semantic rep- resentation of arguments that can be used in empirical studies of scientific discourse as well as to support appli- cations such as argument mining    
Heather Graves, Roger Graves, Robert E Mercer, and Mahzereen Akter. 2014. Titles that announce argumentative claims in biomedical research articles. In Proceedings of the First Workshop on Argumentation Mining, pages 98–99. Biomedical articles Analyses article title as a potential source of claims and finds that fre- quency of verbs in titles of experimen- tal research articles has increased over time    
Heather Graves, Roger Graves, Robert E Mercer, and Mahzereen Akter. 2014. Titles that announce argumentative claims in biomedical research articles. In Proceedings of the First Workshop on Argumentation Mining, pages 98–99. Biomedical articles Analyses article title as a potential source of claims and finds that fre- quency of verbs in titles of experimen- tal research articles has increased over time    
Corpus Creation and New Annotation Schemes
Nancy Green. 2014. Towards creation of a corpus for argumentation mining the biomedical genetics research literature. In Proceedings of the first workshop on argumentation mining, pages 11–18. Biomedical Genetics articles Argument annotation scheme: Premise (Data, Warrant) and Conclu- sion   Theoretical challenges to cre- ate an argument corpora
Nancy Green. 2015a. Identifying argumentation schemes in genetics research articles. In Proceedings of the 2nd Workshop on Argumentation Mining, pages 12–21. Biomedical Genetics articles Identification of argumentation schemes with specification of ten semantically distinct argumentation schemes   Annotation guidelines for ar- gumentation corpora
Simone Teufel and Marc Moens. 1999. Discourselevel argumentation in scientific articles: human and automatic annotation. In Towards Standards and Tools for Discourse Tagging. Chemistry, Com- putational Lin- guistics Detect argument zones in scientific ar- ticles Proposed a scheme and annotated 15 argument zone categories for 39 papers (5,374 sentences)
Christian Kirschner, Judith Eckle-Kohler, and Iryna Gurevych. 2015. Linking the thoughts: Analysis of argumentation structures in scientific publications. In Proceedings of the 2nd Workshop on Argumentation Mining, pages 1–11. Scientific articles (Educational and Developmental Psychology) New annotation scheme to identify argumentative relations - support, at- tack, detail, sequence Study of the annotation strat- egy across 24 articles, an anno- tation tool, a new graph-based inter-annotation measure
Anne Lauscher, Goran Glavaš, and Simone Paolo Ponzetto. 2018b. An argument-annotated corpus of scientific publications. In Proceedings of the 5th Workshop on Argument Mining, pages 40–46. Computer Graph- ics scientific pub- lications Proposed a new argument-annotated dataset of scientific publications Adapted Toulmin’s model for argumentative components: Back- ground Claim, Own Claim, Data. Relation between argumentative components: support, contradicts, same claim Investigation on link between argumentative nature of scien- tific publications and rhetori- cal aspects such as discourse categories or citation contexts.
Mohammed Alliheedi, Robert E Mercer, and Robin Cohen. 2019. Annotation of rhetorical moves in biochemistry articles. In Proceedings of the 6th Workshop on Argument Mining, pages 113–123. Biochemistry arti- cles Determine rhetorical moves in the ar- gument structure of biomedical arti- cles Annotated method sections of 105 text files based on a new annotation scheme for identifying the struc- tured representation of knowledge in a set of sentences describing the experimental procedures
Yufan Guo, Ilona Silins, Roi Reichart, and Anna Korhonen. 2012. Crab reader: A tool for analysis and visualization of argumentative zones in scientific literature. In Proceedings of COLING 2012: Demonstration Papers, pages 183–190. Biomedical papers Introduce a tool for analysis and visu- alizing argument structure (based on AZ), and also facilitate expert AZ an- notation Used HTML, JavaScript, PHP, XML for the annotation tool; SVM classifier using features from Guo et al. (2011) Interactive annotation via ac- tive learning; CRAB Reader allows user to define AZ schemes; AZ can be per- formed on each word, sen- tence, paragraph, document level
An Yang and Sujian Li. 2018. SciDTB: Discourse dependency TreeBank for scientific abstracts. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 444–449, Melbourne, Australia. Association for Computational Linguistics. Scientific ab- stracts from ACL Anthology Construct a domain-specific dis- course treebank annotated on scientific articles 798 segmented abstracts were la- belled by 5 annotators in 6 months. 506 abstracts were annotated more than twice separately by different annotators. In total, SciDTB con- tains 798 unique abstracts with 63% labelled more than once and 18,978 discourse relations.
Automatic Argument Unit Identification
Nancy L Green. 2017b. Argumentation mining in scientific discourse. In CMNA@ ICAIL, pages 7–13. Biomedical, Biological articles Argumentation extraction Semantic rule-based approach Demonstrates the need for a richer model of interargument relationships in biomedical/biological research articles.
Anne Lauscher, Goran Glavaš, and Kai Eckert. 2018a. Arguminsci: A tool for analyzing argumentation and rhetorical aspects in scientific writing. Association for Computational Linguistics. Computer Graphics scientific publications A toolkit for rhetorical analysis of argument component identification, discourse role classification, subjective aspect classification, citation context classification, summary relevance classification Token-level sequence labelling, sentence-level classification using Bi-lSTM Command-line tool, RESTful API, web application
Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto, and Kai Eckert. 2018c. Investigating the role of argumentation in the rhetorical analysis of scientific publications with neural multi-task learning models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3326–3338. Computer Graphics scientific publications Proposed two neural multi-task learning (MTL) models for argumentative analysis based on the tasks in (Lauscher et al., 2018a) Bi-LSTM based simple MTL model for sentence-level classification, hierarchical MTL for sequence labelling Adapted Toulmin’s model for argumentative components: Background Claim, Own Claim, Data. Relation between argumentative components: support, contradicts, same claim
Simone Teufel. 2014. Scientific argumentation detection as limited-domain intention recognition. In ArgNLP. Chemistry, Computational Linguistics, Agriculture Views scientific argumentation detection as limited-domain intent recognition Model based on recognition of 28 rhetorical moves in tex
Yufan Guo, Anna Korhonen, and Thierry Poibeau. 2011. A weakly-supervised approach to argumentative zoning of scientific documents. In Empirical Methods in Natural language Processing (EMNLP). Biomedical abstracts Investigating a weakly-supervised approach for AZ detection when a limited amount of training data is available Features like location, word bi-gram, verb, verb cues, PoS, grammatical relations, subj/obj, voice are used with ASVM, ASSVM, TSVM, SSCRF Conclusion that location of AZs are super important, directions to facilitate easy porting of AZ schemes to new NLP tasks and domains
Xiangci Li, Gully Burns, and Nanyun Peng. 2019. Scientific discourse tagging for evidence extraction. arXiv e-prints, pages arXiv–1909. Biomedical publications Automatic evidence extraction using scientific discourse tagging based on classification by de Waard et al. (2009) sentence-level sequential labelling using BiLSTM-CRF + Attention Leveraging scientific discourse tagging for evidence fragment detection
Titipat Achakulvisut, Chandra Bhagavatula, Daniel Acuna, and Konrad Kording. 2019. Claim extraction in biomedical publications using deep discourse model and transfer learning. arXiv preprint arXiv:1907.00962. Biomedical abstracts Automated claim extraction Neural discourse tagging model based on a pre-trained BilSTM+CRF followed by transfer learning and fine tuning on a expert annotated dataset New dataset of 1,500 expertannotated biomedical abstracts indicating whether the sentence presents a scientific claim.
Hospice Houngbo and Robert E Mercer. 2014. An automated method to build a corpus of rhetoricallyclassified sentences in biomedical texts. In Proceedings of the first workshop on argumentation mining, pages 19–23. Biomedical articles Identify the components of IMRaD rhetorical structure in biomedical papers Applied a few heuristics to construct a corpus and used machine learning techniques (Naive Bayes and SVM) to classify sentences into Method,Result or Conclusion
José María González Pinto, Serkan Celik, and WolfTilo Balke. 2019. Learning to rank claim-evidence pairs to assist scientific-based argumentation. In International Conference on Theory and Practice of Digital Libraries, pages 41–55. Springer. Biomedical papers Claim-evidence matching as a learning to rank problem where goal is to find evidence in the form of a paper to make a natural language claim appear credible; to assist scientific argumentation Rhetoric Classification Task and Claim-Evidence Rank Task using NB-BoW, SVMBoW, CNN on data from a Wikipedia dump with word2vec trained on PubMed Central UMLS, SemMedDB databases Augmenting "prestige" meta-data features for a paper improved performance, to rank claim-evidence pairs, a model should account for other semantic properties beyond simple content-matching
Syeed Ibn Faiz and Robert E Mercer. 2014. Extracting higher order relations from biomedical text. In Proceedings of the First Workshop on Argumentation Mining, pages 100–101. Biomedical papers Extraction of connections or “higher order relations" between biomedical relations (relationship between biomedical entities). The higher order relation conveys a causal sense, which indicates that the latter relation causes the earlier one. In the first stage, the authors use a discourse relation parser to extract the explicit discourse relations from text. In the second stage, the authors analyze each extracted explicit discourse relation to determine whether it can produce a higher order relation Pilot evaluation on AIMed corpus for protein-protein interaction prediction: identify the full argument extent which contain the biomedical entities
Antonio Jimeno Yepes, James G Mork, and Alan R Aronson. 2013. Using the argumentative structure of scientific literature to improve information access. In Proceedings of the 2013 Workshop on Biomedical Natural Language Processing, pages 102–110. MEDLINE/PubMed abstracts An evaluation of several learning algorithms to label abstract text with argumentative labels, based on structured abstracts available in MEDLINE/PubMed Naive Bayes, SVM, Logistic Legression, CRF, AdaBoostM1 as classifiers for the argumentation labels on abstract text. In addition to textual features, the position of the sentence or paragraph from the beginning of the abstract is used A data set to compare and evaluate GeneRIF indexing approaches. The sentence annotation are: Expression, Function, Isolation, NonGeneRIF, Other, Reference, and Structure on MEDLINE articles.
Automatic Argument Structure Identification
Christian Stab, Christian Kirschner, Judith EckleKohler, and Iryna Gurevych. 2014. Argumentation mining in persuasive essays and scientific articles from the discourse structure perspective. In Proceedings of the Workshop on Frontiers and Connections between Argumentation Theory and Natural Language Processing, Forlì-Cesena, Italy, July 21- 25, 2014, volume 1341 of CEUR Workshop Proceedings. CEUR-WS.org. Scientific articles I Identification of argumentation structures Argument unit identification and relation extraction An evaluation dataset of 20 scientific full-texts annotated with argument relations ‘support’, ‘attack’, ‘sequence’
Valéria D Feltrim, Simone Teufel, Maria Graças V das Nunes, and Sandra M Aluísio. 2006. Argumentative zoning applied to critiquing novices’ scientific abstracts. In Computing Attitude and Affect in Text: Theory and Applications, pages 233–246. Springer. Brazilian PhD Theses A system to detect argumentative structures in text The annotation scheme has the following rhetorical categories: Background, Gap, Purpose, Methodology, Results, Conclusion and Outline. A Naive Bayes classifier to identify the argumentative units Porting of Argumentative Zoning (AZ) from English to Portuguese. A pilot system to demonstrate the effectiveness of AZ for a critiquing tool to support academic writing
Pablo Accuosto and Horacio Saggion. 2020. Mining arguments in scientific abstracts with discourselevel embeddings. Data & Knowledge Engineering, 129:101840. Computational linguistics abstracts Argument unit identification and relation extraction Explore two transfer learning approaches in which discourse parsing is used as an auxiliary task when training argument mining models Propose a new annotation schema and use it to augment a corpus of computational linguistics abstracts that had previously been annotated with discourse units and relations
Ningyuan Song, Hanghang Cheng, Huimin Zhou, and Xiaoguang Wang. 2019. Argument structure mining in scientific articles: a comparative analysis. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pages 339–340. IEEE. Information Science and Biomedical articles Apply sequential pattern mining to analyse the common argument structure in two scientific domains (Information science and biomedical science)
Applications
Pablo Accuosto and Horacio Saggion. 2019. Transferring knowledge from discourse to arguments: A case study with scientific abstracts. In Stein B, Wachsmuth H, editors. Proceedings of the 6th Workshop on Argument Mining; 2019 Aug 1; Florence, Italy. Stroudsburg: Association for Computational Linguistics; 2019. p. 41-51. ACL (Association for Computational Linguistics). Computational Linguistics abstracts I Leverage existing discourse parsing RST annotations (Stede et al., 2017) to identify argumentative components and relations Transfer learning to improve the performance of argument mining tasks trained with a small corpus of 60 abstracts by leveraging the discourse annotations available in the full SciDTB () corpus; sequence labelling task with dependency-based word embeddings, contextualized ElMo, RST encodings, GloVe Enrich a subset of SciDTB with additional layer of argumentation, EDUs as minimal span for annotation, pilot task to predict acceptance/rejection using automatically identified argumentative components and relations
Danish Contractor, Yufan Guo, and Anna Korhonen. 2012. Using argumentative zones for extractive summarization of scientific articles. In Proceedings of COLING 2012, pages 663–678. Biomedical papers Leveraging on AZ features for extractive summarization of scientific articles Used AZ categories as features in final sentence selection process + additionally used verbs, tf-idf, citation and reference occurrences, locative features for classification to generate initial set of candidate sentences. Then performed k-Means cluatering to group similar sentences and select the centroid from each group to generate the summary (redundancy elimination) Demonstrated the efficacy of weakly-supervised AZ classifier for less training data by Guo et al. (2011) for scientific article summary extraction
Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational linguistics, 28(4):409–445. Computational Linguistics papers Summarize scientific articles by concentrating on the rhetorical status of statements in an article Developed an algorithm to select content from articles and classify them into rhetorical categories which integrate argumentation structure in scientific papers
Valéria D Feltrim and Simone Teufel. 2004. Automatic critiquing of novices’ scientific writing using argumentative zoning. In Proc. AAAI spring symposium exploring affect and attitude in text. Brazilian PhD Theses in Computer Science Integrated Argumentative Zoning into an automatic Critiquing Tool for Scientific Writing in Portuguese (SciPo) Implemented a set of 7 features, derived from the 16 used by (Teufel and Moens, 2002), Naive Bayes as the classifier Port the feature detection stage of AZ from English to Portuguese, a human annotation experiment to verify the reproducibility of the annotation scheme, intrinsic evaluation of AZ-part of SciPo
Anita de Waard, S Buckingham Shum, Annamaria Carusi, Jack Park, Matthias Samwald, and Ágnes Sándor. 2009. Hypotheses, evidence and relationships: The hyper approach for representing scientific knowledge claims. IProduction and Manufacturing, Biomedical, Law/Legal The authors present SALT (Semantically Annotated LATEX), a semantic authoring framework that enables the externalization of the argumentation and rhetoric captured in scientific publication’s content. The annotation framework is a layered organization of three ontologies: the Document Ontology - capturing the linear structure of the publication, the Rhetorical Ontology - modeling the rhetorical and argumentation, and the Annotation Ontology - linking the rhetoric and argumentation to the publication’s structure and content. A LATEX and MS-Word plugin for semantic annotation of scientific publications as per SALT scheme
Tudor Groza, Siegfried Handschuh, and Stefan Decker. 2011. Capturing rhetoric and argumentation aspects within scientific publications. In Journal on data semantics XV, pages 1–36. Springer. Proposal to extract knowledge from articles to allow the construction of a system where a specific scientific claim is connected, through trails of meaningful relationships, to experimental evidence. To improve access to collections of scientific papers represented as networks of collection of claims that have a defined epistemic value, with links to experimental evidence and argumentative relationships to other statements and evidence. The authors coin this conceptual approach ‘Hypotheses, Evidence and Relationships’ (HypER).
Bei Yu, Jun Wang, Lu Guo, and Yingya Li. 2020. Measuring correlation-to-causation exaggeration in press releases. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4860–4872, Barcelona, Spain (Online). International Committee on Computational Linguistics. PubMed papers and news articles Study exaggeration in press releases Developed a new corpus and trained models that can identify causal claims in the main statements in a press release. By comparing the claims made in a press release with the corresponding claims in the original research paper, the authors found that 22% of press releases made exaggerated causal claims from correlational findings in observational studies.
Xiangci Li, Gully Burns, and Nanyun Peng. 2021. Scientific discourse tagging for evidence extraction. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2550–2562, Online. Association for Computational Linguistics. Biomedical papers demonstrate the benefit of leveraging scientific discourse tags for downstream tasks such as claim-extraction and evidence fragment detection Develop a sentence-level sequence tagging model to label discourse types for each sentence in a paragraph