Ergebnisse für *

Es wurden 23 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 23 von 23.

Sortieren

  1. POS tagset refinement for linguistic analysis and the impact on statistical parsing
    Autor*in: Rehbein, Ines
    Erschienen: 2014
    Verlag:  Institut für Deutsche Sprache, Bibliothek, Mannheim

    Export in Literaturverwaltung
    Quelle: DNB Sachgruppe Deutsche Sprache und Literatur
    Beteiligt: Hirschmann, Hagen (Verfasser); Henrich, Verena (Herausgeber); Hinrichs, Erhard (Herausgeber); Kok, Daniël de (Herausgeber); Osenova, Petya (Herausgeber); Przepiórkowski, Adam (Herausgeber)
    Sprache: Englisch
    Medientyp: Unbestimmt
    Format: Online
    Weitere Identifier:
    Schlagworte: Korpus <Linguistik>; Parts of speech; Syntaktische Analyse; Annotation
    Umfang: Online-Ressource
    Bemerkung(en):

    In: Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13). December 12-13, 2014, Tübingen, Germany. - Tübingen : University of Tübingen, 2014., S. 172-183, ISBN 978-3-9809183-9-8

  2. The KiezDeutsch Korpus (KiDKo) Release 1.0
  3. The KiezDeutsch Korpus (KiDKo) Release 1.0
    Autor*in: Rehbein, Ines
    Erschienen: 2016
    Verlag:  Institut für Deutsche Sprache, Bibliothek, Mannheim

    Export in Literaturverwaltung
    Quelle: DNB Sachgruppe Deutsche Sprache und Literatur
    Beteiligt: Schalowski, Sören (Verfasser); Wiese, Heike (Verfasser); Calzolari, Nicoletta (Herausgeber); Choukri, Khalid (Herausgeber); Declerck, Thierry (Herausgeber); Loftsson, Harfn (Herausgeber); Maegaard, Bente (Herausgeber); Mariani, Joseph (Herausgeber); Moreno, Asunción (Herausgeber); Odijk, Jan (Herausgeber); Piperidis, Stelios (Herausgeber)
    Sprache: Englisch
    Medientyp: Unbestimmt
    Format: Online
    Weitere Identifier:
    Schlagworte: Gesprochene Sprache; Stadtmundart; Jugendsprache; Multikulturelle Gesellschaft; Korpus <Linguistik>
    Weitere Schlagworte: spoken language corpora; urban youth language; Kiezdeutsch
    Umfang: Online-Ressource
    Bemerkung(en):

    In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). May 26-31, 2014. Harpa Concert Hall and Conference Center. Reykjavik, Iceland. - Paris : European Language Resources Association, 2014., S. 3927-3934, ISBN 978-2-9517408-8-4

  4. The KiezDeutsch Korpus (KiDKo) Release 1.0
    Autor*in: Rehbein, Ines
    Erschienen: 2016
    Verlag:  Institut für Deutsche Sprache, Bibliothek, Mannheim

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: Verbundkataloge
    Beteiligt: Schalowski, Sören (Verfasser); Wiese, Heike (Verfasser); Calzolari, Nicoletta (Herausgeber); Choukri, Khalid (Herausgeber); Declerck, Thierry (Herausgeber); Loftsson, Harfn (Herausgeber); Maegaard, Bente (Herausgeber); Mariani, Joseph (Herausgeber); Moreno, Asunción (Herausgeber); Odijk, Jan (Herausgeber); Piperidis, Stelios (Herausgeber)
    Sprache: Englisch
    Medientyp: Buch (Monographie)
    Format: Online
    Weitere Identifier:
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Gesprochene Sprache; Stadtmundart; Jugendsprache; Multikulturelle Gesellschaft; Korpus <Linguistik>
    Weitere Schlagworte: spoken language corpora; urban youth language; Kiezdeutsch
    Umfang: Online-Ressource
    Bemerkung(en):

    In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). May 26-31, 2014. Harpa Concert Hall and Conference Center. Reykjavik, Iceland. - Paris : European Language Resources Association, 2014., S. 3927-3934, ISBN 978-2-9517408-8-4

  5. Is it worth the effort? Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation
    Erschienen: 2016
    Verlag:  Heidelberg/New York : Springer

    Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of... mehr

     

    Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  6. Adding nominal spice to SALSA – frame-semantic annotation of German nouns and verbs
    Erschienen: 2016
    Verlag:  Eigenverlag ÖGAI

    This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now... mehr

     

    This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Deutsch; Korpus; Frame-Semantik
    Lizenz:

    opendatacommons.org/licenses/by/1.0/ ; info:eu-repo/semantics/openAccess

  7. Yes we can!? Annotating the senses of English modal verbs
    Erschienen: 2016
    Verlag:  Paris : European Language Resources Association (ELRA)

    This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement... mehr

     

    This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Englisch; Modalverb; Annotation
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  8. Semantic frames as an anchor representation for sentiment analysis
    Erschienen: 2016
    Verlag:  Stroudsburg : Association for Computational Linguistics

    Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep... mehr

     

    Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep analysis depends on a) enriching shallow representations with linguistically motivated, rich information, and b) focussing different branches of research and combining ressources to create synergies with related work in NLP. In the paper, we propose SentiFrameNet, an extension to FrameNet, as a novel representation for sentiment analysis that is tailored to these aims.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Frame-Semantik; Propositionale Einstellung; Automatische Textanalyse
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  9. Evaluating the Impact of Coder Errors on Active Learning
    Erschienen: 2016
    Verlag:  Stroudsburg : Association for Computational Linguistics

    Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard... mehr

     

    Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard data, there is some doubt whether the approach can be successful when applied to noisy, real-world data sets. This paper presents a thorough evaluation of the impact of annotation noise on AL and shows that systematic noise resulting from biased coder decisions can seriously harm the AL process. We present a method to filter out inconsistent annotations during AL and show that this makes AL far more robust when applied to noisy data.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  10. There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task
    Erschienen: 2016
    Verlag:  European Language Resources Association

    In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply... mehr

     

    In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb drohen (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Dokumentverarbeitung; Automatische Sprachanalyse; Annotation
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  11. Bringing Active Learning to Life
    Erschienen: 2016
    Verlag:  Beijing : Tsinghua University Press

    Active learning has been applied to different NLP tasks, with the aim of limiting the amount of time and cost for human annotation. Most studies on active learning have only simulated the annotation scenario, using prelabelled gold standard data. We... mehr

     

    Active learning has been applied to different NLP tasks, with the aim of limiting the amount of time and cost for human annotation. Most studies on active learning have only simulated the annotation scenario, using prelabelled gold standard data. We present the first active learning experiment for Word Sense Disambiguation with human annotators in a realistic environment, using fine-grained sense distinctions, and investigate whether AL can reduce annotation cost and boost classifier performance when applied to a real-world task.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Computerlinguistik; Annotation
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/de ; info:eu-repo/semantics/openAccess

  12. Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation
    Erschienen: 2016
    Verlag:  The Association for Computational Linguistics and The Asian Federation of Natural Processing

    In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation... mehr

     

    In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation does increase its overall quality.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Frame-Semantik; Automatische Sprachanalyse; Annotation
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  13. MaJo - A Toolkit for Supervised Word Sense Disambiguation and Active Learning
    Erschienen: 2016
    Verlag:  Milano : EDUCatt

    We present MaJo, a toolkit for supervised Word Sense Disambiguation (WSD), with an interface for Active Learning. Our toolkit combines a flexible plugin architecture which can easily be extended, with a graphical user interface which guides the user... mehr

     

    We present MaJo, a toolkit for supervised Word Sense Disambiguation (WSD), with an interface for Active Learning. Our toolkit combines a flexible plugin architecture which can easily be extended, with a graphical user interface which guides the user through the learning process. MaJo integrates off-the-shelf NLP tools like POS taggers, treebank-trained statistical parsers, as well as linguistic resources like WordNet and GermaNet. It enables the user to systematically explore the benefit gained from different feature types for WSD. In addition, MaJo provides an Active Learning environment, where the system presents carefully selected instances to a human oracle. The toolkit supports manual annotation of the selected instances and re-trains the system on the extended data set. MaJo also provides the means to evaluate the performance of the system against a gold standard. We illustrate the usefulness of our system by learning the frames (word senses) for three verbs from the SALSA corpus, a version of the TiGer treebank with an additional layer of frame-semantic annotation. We show how MaJo can be used to tune the feature set for specific target words and so improve performance for these targets. We also show that syntactic features, when carefully tuned to the target word, can lead to a substantial increase in performance.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  14. STTS goes Kiez – Experiments on Annotating and Tagging Urban Youth Language
  15. Towards a syntactically motivated analysis of modifiers in German
    Erschienen: 2016
    Verlag:  Hildesheim : Universitätsverlag Hildesheim

    The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of... mehr

     

    The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of particles used for expressing modality, intensity, graduation, or to mark the focus of the sentence. In the paper, we present an extension to the STTS which provides tags for a more fine-grained analysis of modification, based on a syntactic perspective on parts of speech. We argue that the new classification not only enables us to do corpus-based linguistic studies on modification, but also improves statistical parsing. We give proof of concept by training a data-driven dependency parser on data from the TiGer treebank, providing the parser a) with the original STTS tags and b) with the new tags. Results show an improved labelled accuracy for the new, syntactically motivated classification.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Annotation; Automatische Sprachanalyse; Korpus
    Lizenz:

    creativecommons.org/licenses/by/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  16. POS error detection in automatically annotated corpora
    Autor*in: Rehbein, Ines
    Erschienen: 2016
    Verlag:  Stroudsburg, PA : ACL

    Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can... mehr

     

    Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can not be used for automatically annotated corpora where errors are systematic and cannot easily be identified by looking at the variance in the data. This paper targets the detection of POS errors in automatically annotated corpora, so-called silver standards, showing that by combining different measures sensitive to annotation quality we can identify a large part of the errors and obtain a substantial increase in accuracy.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Automatische Sprachanalyse; Annotation
    Lizenz:

    creativecommons.org/licenses/by/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  17. The KiezDeutsch Korpus (KiDKo) Release 1.0
    Erschienen: 2016
    Verlag:  Paris : European Language Resources Association

    This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The... mehr

     

    This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Gesprochene Sprache; Stadtmundart; Jugendsprache; Multikulturelle Gesellschaft; Korpus
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  18. Discussing best practices for the annotation of Twitter microtext
    Erschienen: 2016
    Verlag:  Sofia : Bulgarian Academy of Sciences

    This paper contributes to the discussion on best practices for the syntactic analysis of non-canonical language, focusing on Twitter microtext. We present an annotation experiment where we test an existing POS tagset, the Stuttgart-Tübingen Tagset... mehr

     

    This paper contributes to the discussion on best practices for the syntactic analysis of non-canonical language, focusing on Twitter microtext. We present an annotation experiment where we test an existing POS tagset, the Stuttgart-Tübingen Tagset (STTS), with respect to its applicability for annotating new text from the social media, in particular from Twitter microblogs. We discuss different tagset extensions proposed in the literature and test our extended tagset on a set of 506 tweets (7.418 tokens) where we achieve an inter-annotator agreement for two human annotators in the range of 92.7 to 94.4 (k). Our error analysis shows that especially the annotation of Twitterspecific phenomena such as hashtags and at-mentions causes disagreements between the human annotators. Following up on this, we provide a discussion of the different uses of the @- and #-marker in Twitter and argue against analysing both on the POS level by means of an at-mention or hashtag label. Instead, we sketch a syntactic analysis which describes these phenomena by means of syntactic categories and grammatical functions.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Syntaktische Analyse; Annotation; Twitter <Softwareplattform>
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  19. Extending the STTS for the Annotation of Spoken Language
    Erschienen: 2016
    Verlag:  Wien : Eigenverlag ÖGAI

    This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and... mehr

     

    This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Gesprochene Sprache; Annotation; Automatische Sprachanalyse; Interoperabilität
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  20. Data point selection for self-training
    Autor*in: Rehbein, Ines
    Erschienen: 2016
    Verlag:  Stroudsburg, PA : Association for Computational

    Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse... mehr

     

    Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse data problems for statistical parsing. We present a simple approach for addressing these issues. Our approach makes use of self-training on instances selected with regard to their similarity to the annotated data. Our similarity measure is based on the perplexity of part-of-speech trigrams of new instances measured against the annotated training data. Preliminary results show that our method outperforms a self-training setting where instances are simply selected by order of occurrence in the corpus and argue that selftraining is a cheap and effective method for improving parsing accuracy for morphologically rich languages.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Satzanalyse; Automatische Sprachanalyse
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  21. Hard constraints for grammatical function labelling
    Erschienen: 2016
    Verlag:  Stroudsburg, PA : Association for Computational Linguistics

    For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local... mehr

     

    For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local information for function labelling and fail to capture important restrictions on the distribution of core argument functions such as subject, object etc., namely that there is at most one subject (etc.) per clause. We augment a statistical classifier with an integer linear program imposing hard linguistic constraints on the solution space output by the classifier, capturing global distributional restrictions. We show that this improves labelling quality, in particular for argument grammatical functions, in an intrinsic evaluation, and, importantly, grammar coverage for treebankbased (Lexical-Functional) grammar acquisition and parsing, in an extrinsic evaluation.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Phrasenstruktur; Automatische Sprachanalyse
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  22. Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
    Erschienen: 2016
    Verlag:  Paris : European Language Resources Association

    In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse... mehr

     

    In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Gesprochene Sprache; Annotation; Irisch
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  23. Scalable Discriminative Parsing for German
    Erschienen: 2016
    Verlag:  Stroudsburg, PA : Association for Computational Linguistics

    Generative lexicalized parsing models, which are the mainstay for probabilistic parsing of English, do not perform as well when applied to languages with different language-specific properties such as free(r) word order or rich morphology. For German... mehr

     

    Generative lexicalized parsing models, which are the mainstay for probabilistic parsing of English, do not perform as well when applied to languages with different language-specific properties such as free(r) word order or rich morphology. For German and other non-English languages, linguistically motivated complex treebank transformations have been shown to improve performance within the framework of PCFG parsing, while generative lexicalized models do not seem to be as easily adaptable to these languages. In this paper, we show a practical way to use grammatical functions as first-class citizens in a discriminative model that allows to extend annotated treebank grammars with rich feature sets without having to suffer from sparse data problems. We demonstrate the flexibility of the approach by integrating unsupervised PP attachment and POS-based word clusters into the parser.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Deutsch; Syntaktische Analyse; Automatische Sprachanalyse; Grammatik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess