Suchergebnisse

POS tagset refinement for linguistic analysis and the impact on statistical parsing

Autor*in: Rehbein, Ines

Erschienen: 2014

Verlag: Institut für Deutsche Sprache, Bibliothek, Mannheim

Bibliographische Angaben
Zugang

Zugang:

Resolving-System

Langzeitarchivierung Nationalbibliothek

Verlag (kostenfrei)

Export in Literaturverwaltung

Quelle:	DNB Sachgruppe Deutsche Sprache und Literatur
Beteiligt:	Hirschmann, Hagen (Verfasser); Henrich, Verena (Herausgeber); Hinrichs, Erhard (Herausgeber); Kok, Daniël de (Herausgeber); Osenova, Petya (Herausgeber); Przepiórkowski, Adam (Herausgeber)
Sprache:	Englisch
Medientyp:	Unbestimmt
Format:	Online
Weitere Identifier:	urn: urn:nbn:de:bsz:mh39-80368
Schlagworte:	Korpus <Linguistik>; Parts of speech; Syntaktische Analyse; Annotation
Umfang:	Online-Ressource
Bemerkung(en):	In: Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13). December 12-13, 2014, Tübingen, Germany. - Tübingen : University of Tübingen, 2014., S. 172-183, ISBN 978-3-9809183-9-8

The KiezDeutsch Korpus (KiDKo) Release 1.0

Autor*in: Rehbein, Ines ; Schalowski, Sören ; Wiese, Heike ; Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Loftsson, Harfn ; Maegaard, Bente ; Mariani, Joseph ; Moreno, Asunción ; Odijk, Jan ; Piperidis, Stelios

Erschienen: 2016

Bibliographische Angaben
Zugang

Volltext:	https://d-nb.info/1136662294/34 https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5599
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-55999

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Unbestimmt
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	spoken language corpora; urban youth language; Kiezdeutsch
Lizenz:	kostenfrei

The KiezDeutsch Korpus (KiDKo) Release 1.0

Autor*in: Rehbein, Ines

Erschienen: 2016

Verlag: Institut für Deutsche Sprache, Bibliothek, Mannheim

Bibliographische Angaben
Zugang

Zugang:

Resolving-System

Langzeitarchivierung Nationalbibliothek

Verlag (kostenfrei)

Export in Literaturverwaltung

Quelle:	DNB Sachgruppe Deutsche Sprache und Literatur
Beteiligt:	Schalowski, Sören (Verfasser); Wiese, Heike (Verfasser); Calzolari, Nicoletta (Herausgeber); Choukri, Khalid (Herausgeber); Declerck, Thierry (Herausgeber); Loftsson, Harfn (Herausgeber); Maegaard, Bente (Herausgeber); Mariani, Joseph (Herausgeber); Moreno, Asunción (Herausgeber); Odijk, Jan (Herausgeber); Piperidis, Stelios (Herausgeber)
Sprache:	Englisch
Medientyp:	Unbestimmt
Format:	Online
Weitere Identifier:	urn: urn:nbn:de:bsz:mh39-55999
Schlagworte:	Gesprochene Sprache; Stadtmundart; Jugendsprache; Multikulturelle Gesellschaft; Korpus <Linguistik>
Weitere Schlagworte:	spoken language corpora; urban youth language; Kiezdeutsch
Umfang:	Online-Ressource
Bemerkung(en):	In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). May 26-31, 2014. Harpa Concert Hall and Conference Center. Reykjavik, Iceland. - Paris : European Language Resources Association, 2014., S. 3927-3934, ISBN 978-2-9517408-8-4

The KiezDeutsch Korpus (KiDKo) Release 1.0

Autor*in: Rehbein, Ines

Erschienen: 2016

Verlag: Institut für Deutsche Sprache, Bibliothek, Mannheim

Zugang:

Resolving-System

Langzeitarchivierung Nationalbibliothek

Verlag (kostenfrei)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Beteiligt:	Schalowski, Sören (Verfasser); Wiese, Heike (Verfasser); Calzolari, Nicoletta (Herausgeber); Choukri, Khalid (Herausgeber); Declerck, Thierry (Herausgeber); Loftsson, Harfn (Herausgeber); Maegaard, Bente (Herausgeber); Mariani, Joseph (Herausgeber); Moreno, Asunción (Herausgeber); Odijk, Jan (Herausgeber); Piperidis, Stelios (Herausgeber)
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
Weitere Identifier:	urn: urn:nbn:de:bsz:mh39-55999
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Gesprochene Sprache; Stadtmundart; Jugendsprache; Multikulturelle Gesellschaft; Korpus <Linguistik>
Weitere Schlagworte:	spoken language corpora; urban youth language; Kiezdeutsch
Umfang:	Online-Ressource
Bemerkung(en):	In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). May 26-31, 2014. Harpa Concert Hall and Conference Center. Reykjavik, Iceland. - Paris : European Language Resources Association, 2014., S. 3927-3934, ISBN 978-2-9517408-8-4

Is it worth the effort? Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef ; Sporleder, Caroline

Erschienen: 2016

Verlag: Heidelberg/New York : Springer

Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4975 https://ids-pub.bsz-bw.de/files/4975/Ruppenhofer_Rehbein_is_it_worth_the_effort_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-49750 https://doi.org/10.1007/s10579-011-9170-z

Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Linguistik (410)
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Adding nominal spice to SALSA – frame-semantic annotation of German nouns and verbs

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef ; Sporleder, Caroline ; Pinkal, Manfred

Erschienen: 2016

Verlag: Eigenverlag ÖGAI

This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5254 https://ids-pub.bsz-bw.de/files/5254/Rehbein_Ruppenhofer_Sporleder_Pinkal_Adding_nominal_spice_to_SALSA_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52542

This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Deutsch; Korpus; Frame-Semantik
Lizenz:	opendatacommons.org/licenses/by/1.0/ ; info:eu-repo/semantics/openAccess

Yes we can!? Annotating the senses of English modal verbs

Autor*in: Ruppenhofer, Josef ; Rehbein, Ines

Erschienen: 2016

Verlag: Paris : European Language Resources Association (ELRA)

This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5255 https://ids-pub.bsz-bw.de/files/5255/Ruppenhofer_Rehbein_Yes_we_can_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52557

This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Englisch; Modalverb; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Semantic frames as an anchor representation for sentiment analysis

Autor*in: Ruppenhofer, Josef ; Rehbein, Ines

Erschienen: 2016

Verlag: Stroudsburg : Association for Computational Linguistics

Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5256 https://ids-pub.bsz-bw.de/files/5256/Ruppenhofer_Rehbein_Semantic_frames_as_an_anchor_representation_for_sentiment_analysis_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52565

Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep analysis depends on a) enriching shallow representations with linguistically motivated, rich information, and b) focussing different branches of research and combining ressources to create synergies with related work in NLP. In the paper, we propose SentiFrameNet, an extension to FrameNet, as a novel representation for sentiment analysis that is tailored to these aims.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Frame-Semantik; Propositionale Einstellung; Automatische Textanalyse
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Evaluating the Impact of Coder Errors on Active Learning

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef

Erschienen: 2016

Verlag: Stroudsburg : Association for Computational Linguistics

Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5292 https://ids-pub.bsz-bw.de/files/5292/Rehbein_Ruppenhofer_Evaluating_the_Impact_of_Coder_Errors_on_Active_Learning_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52929

Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard data, there is some doubt whether the approach can be successful when applied to noisy, real-world data sets. This paper presents a thorough evaluation of the impact of annotation noise on AL and shows that systematic noise resulting from biased coder decisions can seriously harm the AL process. We present a method to filter out inconsistent annotations during AL and show that this makes AL far more robust when applied to noisy data.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef

Erschienen: 2016

Verlag: European Language Resources Association

In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5293 https://ids-pub.bsz-bw.de/files/5293/Rehbein_Ruppenhofer_There%27s_no_Data_like_More_Data_2010.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52935

In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb drohen (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Dokumentverarbeitung; Automatische Sprachanalyse; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Bringing Active Learning to Life

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef ; Palmer, Alexis

Erschienen: 2016

Verlag: Beijing : Tsinghua University Press

Active learning has been applied to different NLP tasks, with the aim of limiting the amount of time and cost for human annotation. Most studies on active learning have only simulated the annotation scenario, using prelabelled gold standard data. We... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5294 https://ids-pub.bsz-bw.de/files/5294/Rehbein_Ruppenhofer_Palmer_Bringing_Active_Learning_to_Life_2010.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52945

Active learning has been applied to different NLP tasks, with the aim of limiting the amount of time and cost for human annotation. Most studies on active learning have only simulated the annotation scenario, using prelabelled gold standard data. We present the first active learning experiment for Word Sense Disambiguation with human annotators in a realistic environment, using fine-grained sense distinctions, and investigate whether AL can reduce annotation cost and boost classifier performance when applied to a real-world task.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Computerlinguistik; Annotation
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/de ; info:eu-repo/semantics/openAccess

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef ; Sporleder, Caroline

Erschienen: 2016

Verlag: The Association for Computational Linguistics and The Asian Federation of Natural Processing

In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5308 https://ids-pub.bsz-bw.de/files/5308/Rehbein_Ruppenhofer_Sporleder_Assessing_the_benefits_of_partial_pre-labeling_for_frame-semantic_annotation_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-53087

In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation does increase its overall quality.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Frame-Semantik; Automatische Sprachanalyse; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

MaJo - A Toolkit for Supervised Word Sense Disambiguation and Active Learning

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef ; Sunde, Jonas

Erschienen: 2016

Verlag: Milano : EDUCatt

We present MaJo, a toolkit for supervised Word Sense Disambiguation (WSD), with an interface for Active Learning. Our toolkit combines a flexible plugin architecture which can easily be extended, with a graphical user interface which guides the user... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5309 https://ids-pub.bsz-bw.de/files/5309/Rehbein_Ruppenhofer_Sunde_Majo-a_toolkit_for_supervised_word_sense_disambiguation_and_active_learning_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-53093

We present MaJo, a toolkit for supervised Word Sense Disambiguation (WSD), with an interface for Active Learning. Our toolkit combines a flexible plugin architecture which can easily be extended, with a graphical user interface which guides the user through the learning process. MaJo integrates off-the-shelf NLP tools like POS taggers, treebank-trained statistical parsers, as well as linguistic resources like WordNet and GermaNet. It enables the user to systematically explore the benefit gained from different feature types for WSD. In addition, MaJo provides an Active Learning environment, where the system presents carefully selected instances to a human oracle. The toolkit supports manual annotation of the selected instances and re-trains the system on the extended data set. MaJo also provides the means to evaluate the performance of the system against a gold standard. We illustrate the usefulness of our system by learning the frames (word senses) for three verbs from the SALSA corpus, a version of the TiGer treebank with an additional layer of frame-semantic annotation. We show how MaJo can be used to tune the feature set for specific target words and so improve performance for these targets. We also show that syntactic features, when carefully tuned to the target word, can lead to a substantial increase in performance.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

STTS goes Kiez – Experiments on Annotating and Tagging Urban Youth Language

Autor*in: Rehbein, Ines ; Schalowski, Sören

Erschienen: 2016

Verlag: Regensburg : GSCL

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5439 https://ids-pub.bsz-bw.de/files/5439/Rehbein_Schalowski_STTS_goes_Kiez_Experiments_on_Annotating_and_Tagging_Urban_Youth_Language_2013.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-54390

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Jugendsprache; Automatische Sprachverarbeitung; Annotation; Gesprochene Sprache
Lizenz:	creativecommons.org/licenses/by-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

Towards a syntactically motivated analysis of modifiers in German

Autor*in: Rehbein, Ines ; Hirschmann, Hagen

Erschienen: 2016

Verlag: Hildesheim : Universitätsverlag Hildesheim

The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5597 https://ids-pub.bsz-bw.de/files/5597/Rehbein_Hirschmann_Towards_a_syntactically_motivated_analysis_of_modifiers_in_German_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-55975

The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of particles used for expressing modality, intensity, graduation, or to mark the focus of the sentence. In the paper, we present an extension to the STTS which provides tags for a more fine-grained analysis of modification, based on a syntactic perspective on parts of speech. We argue that the new classification not only enables us to do corpus-based linguistic studies on modification, but also improves statistical parsing. We give proof of concept by training a data-driven dependency parser on data from the TiGer treebank, providing the parser a) with the original STTS tags and b) with the new tags. Results show an improved labelled accuracy for the new, syntactically motivated classification.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Annotation; Automatische Sprachanalyse; Korpus
Lizenz:	creativecommons.org/licenses/by/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

POS error detection in automatically annotated corpora

Autor*in: Rehbein, Ines

Erschienen: 2016

Verlag: Stroudsburg, PA : ACL

Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5598 https://ids-pub.bsz-bw.de/files/5598/Rehbein_POS_error_detection_in_automatically_annotated_corpora_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-55986

Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can not be used for automatically annotated corpora where errors are systematic and cannot easily be identified by looking at the variance in the data. This paper targets the detection of POS errors in automatically annotated corpora, so-called silver standards, showing that by combining different measures sensitive to annotation quality we can identify a large part of the errors and obtain a substantial increase in accuracy.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Automatische Sprachanalyse; Annotation
Lizenz:	creativecommons.org/licenses/by/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

The KiezDeutsch Korpus (KiDKo) Release 1.0

Autor*in: Rehbein, Ines ; Schalowski, Sören ; Wiese, Heike

Erschienen: 2016

Verlag: Paris : European Language Resources Association

This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5599 https://ids-pub.bsz-bw.de/files/5599/Rehbein_Schalowski_Wiese_The_KiezDeutsch_Korpus_KiDKo_Release_1_0_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-55999

This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Gesprochene Sprache; Stadtmundart; Jugendsprache; Multikulturelle Gesellschaft; Korpus
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Discussing best practices for the annotation of Twitter microtext

Autor*in: Rehbein, Ines ; Visser, Emiel ; Lestmann, Nadine

Erschienen: 2016

Verlag: Sofia : Bulgarian Academy of Sciences

This paper contributes to the discussion on best practices for the syntactic analysis of non-canonical language, focusing on Twitter microtext. We present an annotation experiment where we test an existing POS tagset, the Stuttgart-Tübingen Tagset... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5601 https://ids-pub.bsz-bw.de/files/5601/Rehbein_Visser_Lestmann_Discussing_best_practices_for_the_annotation_of_Twitter_microtext_2013.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-56013

This paper contributes to the discussion on best practices for the syntactic analysis of non-canonical language, focusing on Twitter microtext. We present an annotation experiment where we test an existing POS tagset, the Stuttgart-Tübingen Tagset (STTS), with respect to its applicability for annotating new text from the social media, in particular from Twitter microblogs. We discuss different tagset extensions proposed in the literature and test our extended tagset on a set of 506 tweets (7.418 tokens) where we achieve an inter-annotator agreement for two human annotators in the range of 92.7 to 94.4 (k). Our error analysis shows that especially the annotation of Twitterspecific phenomena such as hashtags and at-mentions causes disagreements between the human annotators. Following up on this, we provide a discussion of the different uses of the @- and #-marker in Twitter and argue against analysing both on the POS level by means of an at-mention or hashtag label. Instead, we sketch a syntactic analysis which describes these phenomena by means of syntactic categories and grammatical functions.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Syntaktische Analyse; Annotation; Twitter <Softwareplattform>
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Extending the STTS for the Annotation of Spoken Language

Autor*in: Rehbein, Ines ; Schalowski, Sören

Erschienen: 2016

Verlag: Wien : Eigenverlag ÖGAI

This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5602 https://ids-pub.bsz-bw.de/files/5602/Rehbein_Schalowski_Extending_the_STTS_for_the_Annotation_of_Spoken_Language_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-56026

This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Gesprochene Sprache; Annotation; Automatische Sprachanalyse; Interoperabilität
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Data point selection for self-training

Autor*in: Rehbein, Ines

Erschienen: 2016

Verlag: Stroudsburg, PA : Association for Computational

Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5604 https://ids-pub.bsz-bw.de/files/5604/Rehbein_Data_point_selection_for_self_training_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-56043

Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse data problems for statistical parsing. We present a simple approach for addressing these issues. Our approach makes use of self-training on instances selected with regard to their similarity to the annotated data. Our similarity measure is based on the perplexity of part-of-speech trigrams of new instances measured against the annotated training data. Preliminary results show that our method outperforms a self-training setting where instances are simply selected by order of occurrence in the corpus and argue that selftraining is a cheap and effective method for improving parsing accuracy for morphologically rich languages.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Satzanalyse; Automatische Sprachanalyse
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Hard constraints for grammatical function labelling

Autor*in: Seeker, Wolfgang ; Rehbein, Ines ; Kuhn, Joans ; van Genabith, Josef

Erschienen: 2016

Verlag: Stroudsburg, PA : Association for Computational Linguistics

For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5605 https://ids-pub.bsz-bw.de/files/5605/Seeker_Rehbein_Kuhn_Hard_Constraints_for_Grammatical_Function_Labelling_2010.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-56059

For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local information for function labelling and fail to capture important restrictions on the distribution of core argument functions such as subject, object etc., namely that there is at most one subject (etc.) per clause. We augment a statistical classifier with an integer linear program imposing hard linguistic constraints on the solution space output by the classifier, capturing global distributional restrictions. We show that this improves labelling quality, in particular for argument grammatical functions, in an intrinsic evaluation, and, importantly, grammar coverage for treebankbased (Lexical-Functional) grammar acquisition and parsing, in an extrinsic evaluation.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Phrasenstruktur; Automatische Sprachanalyse
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks

Autor*in: Rehbein, Ines ; Scholman, Merel ; Demberg, Vera

Erschienen: 2016

Verlag: Paris : European Language Resources Association

In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5606 https://ids-pub.bsz-bw.de/files/5606/Rehbein_Scholman_Demberg_Annotating_Discourse_Relations_in_Spoken_Language_A_Comparison_of_the_PDTB_and_CCR_Frameworks_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-56068

In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Gesprochene Sprache; Annotation; Irisch
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Scalable Discriminative Parsing for German

Autor*in: Versley, Yannick ; Rehbein, Ines

Erschienen: 2016

Verlag: Stroudsburg, PA : Association for Computational Linguistics

Generative lexicalized parsing models, which are the mainstay for probabilistic parsing of English, do not perform as well when applied to languages with different language-specific properties such as free(r) word order or rich morphology. For German... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5608 https://ids-pub.bsz-bw.de/files/5608/Versley_Rehbein_Scalable_Discriminative_Parsing_for_German_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-56080

Generative lexicalized parsing models, which are the mainstay for probabilistic parsing of English, do not perform as well when applied to languages with different language-specific properties such as free(r) word order or rich morphology. For German and other non-English languages, linguistically motivated complex treebank transformations have been shown to improve performance within the framework of PCFG parsing, while generative lexicalized models do not seem to be as easily adaptable to these languages. In this paper, we show a practical way to use grammatical functions as first-class citizens in a discriminative model that allows to extend annotated treebank grammars with rich feature sets without having to suffer from sparse data problems. We demonstrate the flexibility of the approach by integrating unsupervised PP attachment and POS-based word clusters into the parser.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Deutsch; Syntaktische Analyse; Automatische Sprachanalyse; Grammatik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

POS tagset refinement for linguistic analysis and the impact on statistical parsing

The KiezDeutsch Korpus (KiDKo) Release 1.0

The KiezDeutsch Korpus (KiDKo) Release 1.0

The KiezDeutsch Korpus (KiDKo) Release 1.0

Is it worth the effort? Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Adding nominal spice to SALSA – frame-semantic annotation of German nouns and verbs

Yes we can!? Annotating the senses of English modal verbs

Semantic frames as an anchor representation for sentiment analysis

Evaluating the Impact of Coder Errors on Active Learning

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Bringing Active Learning to Life

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

MaJo - A Toolkit for Supervised Word Sense Disambiguation and Active Learning

STTS goes Kiez – Experiments on Annotating and Tagging Urban Youth Language

Towards a syntactically motivated analysis of modifiers in German

POS error detection in automatically annotated corpora

The KiezDeutsch Korpus (KiDKo) Release 1.0

Discussing best practices for the annotation of Twitter microtext

Extending the STTS for the Annotation of Spoken Language

Data point selection for self-training

Hard constraints for grammatical function labelling

Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks

Scalable Discriminative Parsing for German

Kontakt

Partner