Ergebnisse für *

Es wurden 18 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 18 von 18.

Sortieren

  1. Planning for new media : the the bibliography of German grammar goes online
    Erschienen: 2010
    Verlag:  Mannheim : Institut für Deutsche Sprache

    The evolution of computer technologies and the introduction of the World Wide Web (WWW) have substantially changed the way scientific articles and books are published today. Besides writing for "traditional" print media, more and more authors decide... mehr

     

    The evolution of computer technologies and the introduction of the World Wide Web (WWW) have substantially changed the way scientific articles and books are published today. Besides writing for "traditional" print media, more and more authors decide to reach a larger audience and to decrease distribution time by offering their works on the internet. The electronic medium not only facilitates the spread of information, it also adds new value by extending the possibilities of knowledge retrieval. Of course the same is true for structured data collections like scientific glossaries, dictionaries or bibliographies. They particularly profit from the web when being accessible via user-friendly and effective frontends. The following chapters deal with the transformation of the Bibliography of German Grammar (“Bibliografie zur deutschen Grammatik”) from a data pool primarly used for print publishing to a relational database application offering a basis for media-independent distribution. Starting with a short description of the beginnings of the bibliography, the focus of this article lies on the explanation of our current database design as well as on the presentation of the web-based user interface.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Buch (Monographie)
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Wörterbuch; Internet; Online-Publikation; Deutsch
    Lizenz:

    creativecommons.org/licenses/by-nc-nd/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  2. GenitivDB - a corpus-generated database for German genitive classification
    Erschienen: 2014
    Verlag:  European Language Resources Association (ELRA)

    We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms,... mehr

     

    We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms, which is the largest linguistic resource worldwide for the study of contemporary written German. The result is a comprehensive database of German genitive formations, enriched with a broad range of intra- und extralinguistic metadata. It can be used for the notoriously controversial classification and prediction of genitive endings (short endings, long endings, zero-marker). We also evaluate the main factors influencing the use of specific endings. To get a general idea about a factor’s influences and its side effects, we calculate chi-square-tests and visualize the residuals with an association plot. The results are evaluated against a gold standard by implementing tree-based machine learning algorithms. For the statistical analysis, we applied the supervised LMT Logistic Model Trees algorithm, using the WEKA software. We intend to use this gold standard to evaluate GenitivDB, as well as to explore methodologies for a predictive genitive model.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Deutsch; Genitiv; Korpus
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  3. Using a domain ontology for the semantic-statistical classification of specialist hypertexts
    Erschienen: 2015

    In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word... mehr

     

    In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word lists as well as a domain ontology on linguistics. We present the calculation and ranking of keyword candidates based on ontology relationships, word position, frequency information, and statistical significance as evidenced by log-likelihood tests. Finally, the results of our machine-driven classification are validated empirically against manually assigned keywords.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Linguistische Datenverarbeitung; Wissenspräsentation; Semantisches Netz; Grammatik; Deutsch
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  4. Introducing Interactive Grammar: How to Develop Language Competence with Research-based Learning
    Erschienen: 2023
    Verlag:  Hagen : FernUniversität in Hagen ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

    Wir präsentieren die Implementierung einer interaktiven E-Learning-Plattform für das Lernen im Klassenzimmer und das Selbststudium, die dabei hilft, deutsche Sprachkompetenz – Wortschatz, Rechtschreibung und Grammatik – auf verschiedenen Ebenen und... mehr

     

    Wir präsentieren die Implementierung einer interaktiven E-Learning-Plattform für das Lernen im Klassenzimmer und das Selbststudium, die dabei hilft, deutsche Sprachkompetenz – Wortschatz, Rechtschreibung und Grammatik – auf verschiedenen Ebenen und für alltägliche Anwendungen weiterzuentwickeln. Das LernGrammis-Portal richtet sich mit jeweils passenden Lerninhalten und interaktiven Bausteinen gleichermaßen an Schüler und Studierende, (angehende) Lehrkräfte und L2-Lernende der deutschen Sprache. Damit bietet es der digitalen Vernetzungsinfrastruktur für die Bildung eine einzigartige, frei verfügbare und wissenschaftlich fundierte Lernressource. Unter Anwendung des innovativen Konzepts des „Forschenden Lernens“ bietet LernGrammis Lehrkräften Ideen für die Unterrichtsplanung und Lernenden spezielle Module zur Entwicklung neuer Fähigkeiten durch die Erkundung authentischer Sprachressourcen und auf diese Weise die Beantwortung maßgeschneiderter, niedrigschwelliger Forschungsfragen. Anhand erprobter Praxisbeispiele demonstrieren wir den Ansatz, seine Stärken und Möglichkeiten sowie erste Auswertungsergebnisse von Nutzerfeedback. ; We present the implementation of an interactive e-learning platform for both classroom study and self-study, that helps developing German language competence – vocabulary, spelling, and grammar – on various levels and for everyday life applications. The LernGrammis portal addresses school and highschool students, (prospective) teachers, and L2 learners of German equally, each with appropriate educational content and interactive components. It thus offers the digital networking infrastructure for education a unique, freely available and scientifically based learning resource. Applying the innovative concept of „Research-based Learning (RBL)“, LernGrammis provides teachers with ideas for lesson planning, and learners with dedicated modules to develop new skills through exploring authentic language resources and by this means answering customised low-threshold research questions. Using proven ...

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Grammatik; Spracherwerb; Fremdsprachenlernen; Rechtschreibung; Schulbildung; Wortschatz; E-Learning; Interaktiv; Sprache; Deutsch
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  5. A Functional Database Framework for Querying Very Large Multi-Layer Corpora
    Erschienen: 2015
    Verlag:  Hamburg : Universität Hamburg

    Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS)... mehr

     

    Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Information Retrieval
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  6. E-VALBU: Advanced SQL/XML processing of dictionary data using an object-relational XML database
    Erschienen: 2015
    Verlag:  Duisburg : Universitätsverlag Rhein-Ruhr

    Contemporary practical lexicography uses a wide range of advanced technological aids,most prominently database systems for the administration of dictionary content. Since XML has become a de facto standard for the coding of lexicographic articles,... mehr

     

    Contemporary practical lexicography uses a wide range of advanced technological aids,most prominently database systems for the administration of dictionary content. Since XML has become a de facto standard for the coding of lexicographic articles, integrated markup functionality – such as query, update, or transformation of instances – is of particular importance. Even the multi-channel distribution of dictionary data benefits from powerful XML database services. Exemplified by E-VALBU, the most comprehensive electronic dictionary on German verb valency, we outline an integrated approach for advanced XML storing and processing within an object-relational database, and for a public retrieval frontend using Web Services and AJAX technology.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Computerunterstützte Lexikografie; Valenz; Deutsch; Elektronisches Wörterbuch
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  7. A database-driven ontology for German grammar
    Erschienen: 2015
    Verlag:  Tübingen : Narr

    The main objective of this article is to describe the current activities at the Mannheim Institute for German Language regarding the implementation of a domain-specific ontology for German grammar. We differentiate ontology bases from ontology... mehr

     

    The main objective of this article is to describe the current activities at the Mannheim Institute for German Language regarding the implementation of a domain-specific ontology for German grammar. We differentiate ontology bases from ontology management Systems, point out the benefits of database-driven Solutions, and go Step by Step through all phases of the ontology lifecycle. In Order to demonstrate the practical use of our approach, we outline the interface between our ontology and the grammis web Information System, and compare the ontology-based retrieval mechanism with traditional full text search.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Deutsch; Grammatik; Grammis; Ontologie <Wissensverarbeitung>
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  8. Extracting specialized terminology from linguistic corpora
    Erschienen: 2018
    Verlag:  Heidelberg : Heidelberg University Publishing

    In this paper, we present our approach to automatically extracting German terminology in the domain of grammar using texts from the online information system grammis as our corpus. We analyze existing repositories of German grammatical terminology... mehr

     

    In this paper, we present our approach to automatically extracting German terminology in the domain of grammar using texts from the online information system grammis as our corpus. We analyze existing repositories of German grammatical terminology and develop Part-of-speech patterns for our extraction thereby showing the importance of unigrams in this domain. We contrast the results of the automatic extraction with a manually extracted standard. By comparing the performance of well-known statistical measures, we show how measures based on corpus comparison outperform alternative methods.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Terminologie; Grammatik; Grammis; Deutsch; Automatische Sprachverarbeitung
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

  9. Example-based querying for linguistic specialist corpora
    Erschienen: 2018
    Verlag:  Paris : European language resources association (ELRA)

    The paper describes preliminary studies regarding the usage of Example-Based Querying for specialist corpora. We outline an infrastructure for its application within the linguistic domain. Example-Based Querying deals with retrieval situations where... mehr

     

    The paper describes preliminary studies regarding the usage of Example-Based Querying for specialist corpora. We outline an infrastructure for its application within the linguistic domain. Example-Based Querying deals with retrieval situations where users would like to explore large collections of specialist texts semantically, but are unable to explicitly name the linguistic phenomenon they look for. As a way out, the proposed framework allows them to input prototypical everyday language examples or cases of doubt, which are automatically processed by CRF and linked to appropriate linguistic texts in the corpus.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Grammatik; Syntax; Automatische Sprachanalyse; Korpus; Information Retrieval
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  10. KoGra-DB: Using MapReduce for language corpora
    Erschienen: 2018
    Verlag:  Bonn-Buschdorf : Köllen

    Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS)... mehr

     

    Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Automatische Sprachanalyse
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  11. Re-designing Online Terminology Resources for German Grammar. Project Report

    The compilation of terminological vocabularies plays a central role in the organization and retrieval of scientific texts. Both simple keyword lists as well as sophisticated modellings of relationships between terminological concepts can make a most... mehr

     

    The compilation of terminological vocabularies plays a central role in the organization and retrieval of scientific texts. Both simple keyword lists as well as sophisticated modellings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the Web or within local repositories. This seems especially true for long-established scientific fields with various theoretical and historical branches, such as linguistics, where the use of terminology within documents from different origins is sometimes far from being consistent. In this short paper, we report on the early stages of a project that aims at the re-design of an existing domain-specific KOS for grammatical content grammis. In particular, we deal with the terminological part of grammis and present the state-of-the-art of this online resource as well as the key re-design principles. Further, we propose questions regarding ramifications of the Linked Open Data and Semantic Web approaches for our re-design decisions.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Terminologie; Informationsmanagement; Linguistik; Grammatik
    Lizenz:

    creativecommons.org/licenses/by-nc/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  12. GeCoTagger: annotation of German verb complements with conditional random fields
    Erschienen: 2018
    Verlag:  Paris, France : European language resources association (ELRA)

    Complement phrases are essential for constructing well-formed sentences in German. Identifying verb complements and categorizing complement classes is challenging even for linguists who are specialized in the field of verb valency. Against this... mehr

     

    Complement phrases are essential for constructing well-formed sentences in German. Identifying verb complements and categorizing complement classes is challenging even for linguists who are specialized in the field of verb valency. Against this background, we introduce an ML-based algorithm which is able to identify and classify complement phrases of any German verb in any written sentence context. We use a large training set consisting of example sentences from a valency dictionary, enriched with POS tagging, and the ML-based technique of Conditional Random Fields (CRF) to generate the classification models.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Grammatik; Deutsch; Ergänzung; Verb; Valenz
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  13. An empirically validated, onomasiologically structured, and linguistically motivated online terminology. Re-designing scientific resources on German grammar
    Erschienen: 2018
    Verlag:  Berlin [u.a.] : Springer

    Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the... mehr

     

    Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Deutsch; Grammis; Informationssystem; Terminologie; Visualisierung
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  14. A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus"
    Erschienen: 2020
    Verlag:  Paris : European Language Resources Association

    Song lyrics can be considered as a text genre that has features of both written and spoken discourse, and potentially provides extensive linguistic and cultural information to scientists from various disciplines. However, pop songs play a rather... mehr

     

    Song lyrics can be considered as a text genre that has features of both written and spoken discourse, and potentially provides extensive linguistic and cultural information to scientists from various disciplines. However, pop songs play a rather subordinate role in empirical language research so far - most likely due to the absence of scientifically valid and sustainable resources. The present paper introduces a multiply annotated corpus of German lyrics as a publicly available basis for multidisciplinary research. The resource contains three types of data for the investigation and evaluation of quite distinct phenomena: TEI-compliant song lyrics as primary data, linguistically and literary motivated annotations, and extralinguistic metadata. It promotes empirically/statistically grounded analyses of genre-specific features, systemic-structural correlations and tendencies in the texts of contemporary pop music. The corpus has been stratified into thematic and author-specific archives; the paper presents some basic descriptive statistics, as well as the public online frontend with its built-in evaluation forms and live visualisations.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Lyrics <Lyrik>; Popmusik; Sprachvariante; Forschungsdaten; Deutsch
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  15. Data-driven identification of idioms in song lyrics
    Erschienen: 2021
    Verlag:  Stroudsburg : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is... mehr

     

    The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Phraseologie; Lyrics <Lyrik>; Automatische Spracherkennung; Automatische Sprachanalyse; Komposition <Wortbildung>; Semantik; Deutsch
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  16. Shallow context analysis for German idiom detection
    Erschienen: 2023
    Verlag:  Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting... mehr

     

    In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Kontextanalyse; Deutsch; Phraseologie; Datensatz; Automatische Sprachanalyse; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  17. Decision Tree-Based Evaluation of Genitive Classification – An Empirical Study on CMC and Text Corpora. Language Processing and Knowledge in the Web
    Erschienen: 2016
    Verlag:  Berlin/Heidelberg : Springer

    Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between... mehr

     

    Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  18. Evaluating DBMS-based Access Strategies to Very Large Multi-layer Corpora
    Erschienen: 2016
    Verlag:  Paris : European Language Resources Association (ELRA)

    Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and metadata schemes of language corpora grow rapidly, performing complex searches becomes a highly computational expensive task. We evaluate several... mehr

     

    Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and metadata schemes of language corpora grow rapidly, performing complex searches becomes a highly computational expensive task. We evaluate several storage models and indexing variants in two multi-processor/multi-core environments, focusing on prototypical linguistic querying scenarios. Our aim is to reveal modeling and querying tendencies – rather than absolute benchmark results – when using a relational database management system (RDBMS) and MapReduce for natural language corpus retrieval. Based on these findings, we are going to improve our approach for the efficient exploitation of very large corpora, combining advantages of state-of-the-art database systems with decomposition/parallelization strategies. Our reference implementation uses the German DeReKo reference corpus with currently more than 4 billion word forms, various multi-layer linguistic annotations, and several types of text-specific metadata. The proposed strategy is language-independent and adaptable to large-scale multilingual corpora.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess