Filtern nach
Letzte Suchanfragen

Ergebnisse für *

Es wurden 31 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 25 von 31.

Sortieren

  1. The TEI-based ISO standard “Transcription of Spoken Language” as an exchange format within CLARIN and beyond
    Erschienen: 2021
    Verlag:  Utrecht : CLARIN ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper describes the TEI-based ISO standard 2462:2016 “Transcription of spoken language” and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability... mehr

     

    This paper describes the TEI-based ISO standard 2462:2016 “Transcription of spoken language” and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability between these formats and with relevant tools and services. The main idea behind the paper is that a digital infrastructure providing language resources and services to researchers should also allow the combined use of resources and/or services from different contexts. This requires syntactic and semantic interoperability. We propose a solution based on the ISO/TEI format and describe the necessary steps for this format to work as an exchange format with basic semantic interoperability for spoken language resources across the CLARIN infrastructure and beyond.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: ISO-Norm; Mündliche Kommunikation; Transkription; Text Encoding Initiative; Korpus; Computerlinguistik; Datenmanagement
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  2. The TEI-based ISO Standard ‘Transcription of spoken language’ as an Exchange Format within CLARIN and beyond
    Erschienen: 2022
    Verlag:  Linköping : Linköping University Electronic Press ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper describes the TEI-based ISO standard 24624:2016 ‘Transcription of spoken language’ and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability... mehr

     

    This paper describes the TEI-based ISO standard 24624:2016 ‘Transcription of spoken language’ and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability between these formats and with rele- vant tools and services. The main idea behind the paper is that a digital infrastructure providing language resources and services to researchers should also allow the combined use of resources and/or services from different contexts. This requires syntactic and semantic interoperability. We propose a solution based on the ISO/TEI format and describe the necessary steps for this format to work as an exchange format with basic semantic interoperability for spoken language resources across the CLARIN infrastructure and beyond.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Englisch, Altenglisch (420)
    Schlagworte: Clarin; Gesprochene Sprache; Sprachübersetzung; Annotation; Datenmanagement; Forschungsdaten; Korpus
    Lizenz:

    creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

  3. Evaluating Workflows for Creating Orthographic Transcripts for Oral Corpora by Transcribing from Scratch or Correcting ASR-Output
    Erschienen: 2024
    Verlag:  Paris : ELRA Language Resource Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Research projects incorporating spoken data require either a selection of existing speech corpora, or they plan to record new data. In both cases, recordings need to be transcribed to make them accessible to analysis. Underestimating the effort of... mehr

     

    Research projects incorporating spoken data require either a selection of existing speech corpora, or they plan to record new data. In both cases, recordings need to be transcribed to make them accessible to analysis. Underestimating the effort of transcribing can be risky. Automatic Speech Recognition (ASR) holds the promise to considerably reduce transcription effort. However, few studies have so far attempted to evaluate this potential. The present paper compares efforts for manual transcription vs. correction of ASR-output. We took recordings from corpora of varying settings (interview, colloquial talk, dialectal, historic) and (i) compared two methods for creating orthographic transcripts: transcribing from scratch vs. correcting automatically created transcripts. And (ii) we evaluated the influence of the corpus characteristics on the correcting efficiency. Results suggest that for the selected data and transcription conventions, transcribing and correcting still take equally long with 7 times real-time on average. The more complex the primary data, the more time has to be spent on corrections. Despite the impressive latest developments in speech technology, to be a real help for conversation analysts or dialectologists, ASR systems seem to require even more improvement, or we need sufficient and appropriate data for training such systems.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Deutsch; Korpus; Gesprochene Sprache; Automatische Spracherkennung
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/deed.de ; info:eu-repo/semantics/openAccess

  4. Refining and Exploiting the Structural Markup of the eWDG
    Erschienen: 2014
    Verlag:  Barcelona : Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra:

    In this paper, the authors describe a semi-automated approach to refine the dictionary-entry structure of the digital version of the Wörterbuch der deutschen Gegenwartssprache (WDG, en.: Dictionary of Present-day German), a dictionary compiled and... mehr

     

    In this paper, the authors describe a semi-automated approach to refine the dictionary-entry structure of the digital version of the Wörterbuch der deutschen Gegenwartssprache (WDG, en.: Dictionary of Present-day German), a dictionary compiled and published between 1952 and 1977 by the Deutsche Akademie der Wissenschaften that comprises six volumes with over 4,500 pages containing more than 120,000 headwords. We discuss the benefits of such a refinement in the context of the dictionary project Digitales Wörterbuch der deutschen Sprache (DWDS, en: Digital Dictionary of the German language). In the current phase of the DWDS project, we aim to integrate multiple dictionary and corpus resources in German language into a digital lexical system (DLS). In this context, we plan to expand the current DWDS interface with several special purpose components, which are adaptive in the sense that they offer specialized data views and search mechanisms for different dictionary functions-e.g. text comprehension, text production-and different user groups-e.g. journalists, translators, linguistic researchers, computational linguists. One prerequisite for generating such data views is the selective access to the lexical items in the article structure of the dictionaries which are the object of study. For this purpose, the representation of the eWDG has to be refined. The focus of this paper is on the semiautomated approach used to transform eWDG into a refined version in which the main structural units can be explicitly accessed. We will show how this refinement opens new and flexible ways of visualizing and querying the lexicographic content of the refined version in the context of the DLS project.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Wörterbücher (413)
    Schlagworte: Computerunterstützte Lexikographie
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  5. Interfacing Lexical and Ontological Information in a Multilingual Soccer FrameNet
    Erschienen: 2014

    This paper presents ongoing work on a multilingual (English, French, German) lexical resource of soccer language. The first part describes how lexicographic descriptions based on frame-semantic principles are derived from a partially aligned... mehr

     

    This paper presents ongoing work on a multilingual (English, French, German) lexical resource of soccer language. The first part describes how lexicographic descriptions based on frame-semantic principles are derived from a partially aligned multilingual corpus of soccer match reports. The remainder of the paper then discusses how different types of ontological knowledge are linked to this resource in order to provide an access structure to the resulting dictionary. It is argued that linking lexical resources and ontologies in such a way provides novel ways to a dictionary user of navigating a domain vocabulary

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Wörterbücher (413)
    Schlagworte: Sportsprache; Fachsprache; Fußball; computerunterstützte Lexikographie; Wörterbuch; Deutsch; Englisch; Französisch; Korpus
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  6. Avoiding Data Graveyards : from Heterogeneous Data Collected in Multiple Research Projects to Sustainable Linguistic Resources

    This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in... mehr

     

    This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in Tübingen, the SFB 538 “Multilingualism” in Hamburg, and the SFB 632 “Information Structure” in Potsdam/Berlin. The aim of the project is to develop methods for sustainable archiving of the diverse bodies of linguistic data used at the three sites. In the first half of the paper, the data handling solutions developed so far at the three centres are briefly introduced. This is followed by an assessment of their commonalities and differences and of what these entail for the work of the new joint initiative. The second part then sketches seven areas of open questions with respect to sustainable data handling and gives a more detailed account of two of them – integration of linguistic terminologies and development of best practice guidelines.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Forschungsdaten; Linguistik; Standardisierung; Langzeitarchivierung
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  7. Sustainability of Linguistic Resources

    This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. This initiative is a cooperation between three linguistic collaborative research centres in Germany, which comprise more than 40 individual... mehr

     

    This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. This initiative is a cooperation between three linguistic collaborative research centres in Germany, which comprise more than 40 individual research projects altogether. These projects are involved in creating manifold language resources, especially corpora, tailored to their particular needs. The aim of the project described here is to ensure an effective and sustainable access of these data by third-party researchers beyond the termination of these projects. This goal involves a number of measures, such as the definition of a common data format to completely capture the heterogeneous information encoded in the individual corpora, the development of user-friendly and sustainably usable tools for processing (e.g. querying) the data, and the specification of common inventories of metadata and terminology. Moreover, the project aims at formulating general rules of best practice for creating, accessing, and archiving linguistic resources.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Forschungsdaten; Linguistik; Computerlinguistik; Langzeitarchivierung
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  8. The Kicktionary: A Multilingual Resource of the Language of Football
    Erschienen: 2014
    Verlag:  Tübingen : Narr

    This paper presents the Kicktionary, a multilingual (English — German - French) electronic lexical resource of the language of football. It explains how a corpus of football match reports was analysed according to the FrameNet and WordNet approaches... mehr

     

    This paper presents the Kicktionary, a multilingual (English — German - French) electronic lexical resource of the language of football. It explains how a corpus of football match reports was analysed according to the FrameNet and WordNet approaches and how the result of this analysis is presented to a dictionary user via a website

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Wörterbücher (413)
    Schlagworte: Computerunterstützte Lexikographie; Sportsprache; Fußball; Korpus; Frame-Theorie
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  9. Collaborative Commentary: Opening Up Spoken Language Databases

    We define collaborative commentary as the involvement of a research community in the interpretive annotation of electronic records. The goal of this process is the evaluation of competing theoretical claims. The process requires commentators to link... mehr

     

    We define collaborative commentary as the involvement of a research community in the interpretive annotation of electronic records. The goal of this process is the evaluation of competing theoretical claims. The process requires commentators to link their comments and related evidentiary materials to specific segments of either transcripts or electronic media. Here, we examine current work in the construction of technical methods for facilitating collaborative commentary through browser technology. To illustrate the relevance of this approach, we examine seven spoken language database projects that have reached a level of web-based publication that makes them good candidates as targets of collaborative commentary technology. For each database, we show how collaborative commentary can advance the relevant research agendas.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Videaufzeichnung; Interaktionsanalyse; Annotation
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  10. Korpus "Skandinavische Semikommunikation" - ein mehrsprachiges Diskurskorpus auf XML-Basis
    Erschienen: 2014
    Verlag:  Sankt Augustin : Gardez!-Verl.

  11. The transcription system EXMARaLDA: An application of the annotation graph formalism as the basis of a database of multilingual spoken discourse
    Erschienen: 2014
    Verlag:  Philadelphia : University of Pennsylvania - Institute for Research in Cognitive Science

    This paper describes EXMARaLDA, a system for computer transcription of spoken discourse developed and used by the SFB "Mehrsprachigkeit" at the university of Hamburg. EXMARaLDA consists of several DTDs for XML coding of transcription data and some... mehr

     

    This paper describes EXMARaLDA, a system for computer transcription of spoken discourse developed and used by the SFB "Mehrsprachigkeit" at the university of Hamburg. EXMARaLDA consists of several DTDs for XML coding of transcription data and some input and output tools for these formats. Apart from being a transcription system in its own right, EXMARaLDA also plays the role of a mediator between older existing data formats at the SFB and between these formats and a planned database of multilingual spoken discourse.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Gesprochene Sprache; Transkription; Computerlinguistik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  12. The database for spoken German - DGD2
    Erschienen: 2014
    Verlag:  Reykjavik : European Language Resources Association (ELRA)

    The Database for Spoken German (Datenbank für Gesprochenes Deutsch, DGD2, http://dgd.ids-mannheim.de) is the central platform for publishing and disseminating spoken language corpora from the Archive of Spoken German (Archiv für Gesprochenes Deutsch,... mehr

     

    The Database for Spoken German (Datenbank für Gesprochenes Deutsch, DGD2, dgd.ids-mannheim.de) is the central platform for publishing and disseminating spoken language corpora from the Archive of Spoken German (Archiv für Gesprochenes Deutsch, AGD, agd.ids-mannheim.de) at the Institute for the German Language in Mannheim. The corpora contained in the DGD2 come from a variety of sources, some of them in-house projects, some of them external projects. Most of the corpora were originally intended either for research into the (dialectal) variation of German or for studies in conversation analysis and related fields. The AGD has taken over the task of permanently archiving these resources and making them available for reuse to the research community. To date, the DGD2 offers access to 19 different corpora, totalling around 9000 speech events, 2500 hours of audio recordings or 8 million transcribed words. This paper gives an overview of the data made available via the DGD2, of the technical basis for its implementation, and of the most important functionalities it offers. The paper concludes with information about the users of the database and future plans for its development.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    Schlagworte: gesprochene Sprache; Korpus
    Lizenz:

    creativecommons.org/licenses/by-nc/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  13. Annotating Modality Interdependencies
    Erschienen: 2015
    Verlag:  Gesellschaft für Sprachtechnologie and Computerlinguistik e.V.

    This paper discusses computational linguistic methods for the semi-automatic analysis of modality interdependencies (the combination of complex resources such as speaking, writing, and visualizing; MID) in professional crosssituational interaction... mehr

     

    This paper discusses computational linguistic methods for the semi-automatic analysis of modality interdependencies (the combination of complex resources such as speaking, writing, and visualizing; MID) in professional crosssituational interaction settings. The overall purpose of the approach is to develop models, methods, and a framework for the description and analysis of MID forms and functions. The paper describes work in progress—the development of an annotation framework that allows annotating different data and file formats at various levels, to relate annotation levels and entries independently of the given file format, and to visualize patterns.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Computerlinguistik; Korpus; Multimodalität; Automatische Sprachanalyse
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  14. FOLKER : an annotation tool for efficient transcription of natural, multi-party interaction
    Erschienen: 2014
    Verlag:  Valletta, Malta : European Language Resources Association (ELRA)

    This paper presents FOLKER, an annotation tool developed for the efficient transcription of natural, multi-party interaction in a conversation analysis framework. FOLKER is being developed at the Institute for German Language in and for the FOLK... mehr

     

    This paper presents FOLKER, an annotation tool developed for the efficient transcription of natural, multi-party interaction in a conversation analysis framework. FOLKER is being developed at the Institute for German Language in and for the FOLK project, whose aim is the construction of a large corpus of spoken present-day German, to be used for research and teaching purposes. FOLKER builds on the experience gained with multi-purpose annotation tools like ELAN and EXMARaLDA, but attempts to improve transcription efficiency by restricting and optimizing both data model and tool functionality to a single, well-defined purpose. This paper starts with a description of the GAT transcription conventions and the data model underlying the tool. It then gives an overview of the tool functionality and compares this functionality to that of other widely used tools.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: gesprochene Sprache; Korpus; Transkription; Computerlinguistik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  15. EXMARaLDA : un système pour la constitution et l’exploitation de corpus oraux
    Erschienen: 2014
    Verlag:  Limoges : Lambert-Lucas

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Französisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: gesprochene Sprache; Transkription; Computerlinguistik; Standardisierung; Korpus
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  16. Creating and working with spoken language corpora in EXMARaLDA
    Erschienen: 2014
    Verlag:  Bozen : Europ. Akad.

    Spoken language corpora— as used in conversation analytic research, language acquisition studies and dialectology— pose a number of challenges that are rarely addressed by corpus linguistic methodology and technology. This paper starts by giving an... mehr

     

    Spoken language corpora— as used in conversation analytic research, language acquisition studies and dialectology— pose a number of challenges that are rarely addressed by corpus linguistic methodology and technology. This paper starts by giving an overview of the most important methodological issues distinguishing spoken language corpus workfrom the work with written data. It then shows what technological challenges these methodological issues entail and demonstrates how they are dealt with in the architecture and tools of the EXMARaLDA system.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: gesprochene Sprache; Korpus; Computerlinguistik; geschriebene Sprache
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  17. Transcribing and annotating spoken language with EXMARaLDA
    Erschienen: 2014
    Verlag:  Paris : ELRA

    This paper describes EXMARaLDA, an XML-based framework for the construction, dissemination and analysis of corpora of spoken language transcriptions. Departing from a prototypical example of a “partitur” (musical score) transcription, the EXMARaLDA... mehr

     

    This paper describes EXMARaLDA, an XML-based framework for the construction, dissemination and analysis of corpora of spoken language transcriptions. Departing from a prototypical example of a “partitur” (musical score) transcription, the EXMARaLDA “single timeline, multiple tiers” data model and format is presented alongside with the EXMARaLDA Partitur-Editor, a tool for inputting and visualizing such data. This is followed by a discussion of the interaction of EXMARaLDA with other frameworks and tools that work with similar data models. Finally, this paper presents an extension of the “single timeline, multiple tiers” data model and describes its application within the EXMARaLDA system.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Gesprochene Sprache; Korpus; Transkription; Computerlinguistik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  18. EXMARaLDA - ein Modellierungs- und Visualisierungsverfahren für die computergestützte Transkription gesprochener Sprache
    Erschienen: 2014
    Verlag:  Wien : Österreichische Gesellschaft für Artificial Intelligence

    This paper attempts a new look at computer assisted transcription as it is commonly practised within the fields of discourse analysis and language acquisition studies. The first part proposes a bridge between discourse analytical methodology and text... mehr

     

    This paper attempts a new look at computer assisted transcription as it is commonly practised within the fields of discourse analysis and language acquisition studies. The first part proposes a bridge between discourse analytical methodology and text technological methods with the concept of modelling as its central idea. The second part demonstrates the EXMARaLDA system, a set of formats and tools for computer assisted transcription that builds on the ideas developed in the first part and implements them in a way that can lead to significant improvement in current research practice.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Gesprochene Sprache; Transkription; Diskursanalyse; Computerlinguistik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  19. A corpus-based lexical resource of spoken German in interaction
    Erschienen: 2019
    Verlag:  Brno, Czech Republic : Lexical Computing CZ s.r.o.

    This paper presents the prototype of a lexicographic resource for spoken German in interaction, which was conceived within the framework of the LeGeDe-project (LeGeDe=Lexik des gesprochenen Deutsch). First of all, it summarizes the theoretical and... mehr

     

    This paper presents the prototype of a lexicographic resource for spoken German in interaction, which was conceived within the framework of the LeGeDe-project (LeGeDe=Lexik des gesprochenen Deutsch). First of all, it summarizes the theoretical and methodological approaches that were used for the initial planning of the resource. The headword candidates were selected by analyzing corpus-based data. Therefore, the data of two corpora (written and spoken German) were compared with quantitative methods. The information that was gathered on the selected headword candidates can be assigned to two different sections: meanings and functions in interaction. Additionally, two studies on the expectations of future users towards the resource were carried out. The results of these two studies were also taken into account in the development of the prototype. Focusing on the presentation of the resource’s content, the paper shows both the different lexicographical information in selected dictionary entries, and the information offered by the provided hyperlinks and external texts. As a conclusion, it summarizes the most important innovative aspects that were specifically developed for the implementation of such a resource.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Computerunterstützte Lexikografie; Gesprochene Sprache; Korpus; Deutsch
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

  20. CLARIN Web Services for TEI-annotated Transcripts of Spoken Language
    Erschienen: 2020
    Verlag:  Utrecht : CLARIN

    We present web services implementing a workflow for transcripts of spoken language following TEI guidelines, in particular ISO 24624:2016 "Language resource management - Transcription of spoken language". The web services are available at our website... mehr

     

    We present web services implementing a workflow for transcripts of spoken language following TEI guidelines, in particular ISO 24624:2016 "Language resource management - Transcription of spoken language". The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Text Encoding Initiative; Gesprochene Sprache; Transkription; Computerlinguistik; Web Services
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  21. A corpus-based lexical resource of spoken German in interaction
    Erschienen: 2019
    Verlag:  Brno, Czech Republic : Lexical Computing CZ s.r.o.

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Computerunterstützte Lexikographie; Gesprochene Sprache; Korpus
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

  22. Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora
    Erschienen: 2020
    Verlag:  Paris : European Language Resources Association

    This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources,... mehr

     

    This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Langzeitarchivierung; Nutzungsrecht; Dateiformat
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  23. Using Full Text Indices for Querying Spoken Language Data
    Erschienen: 2020
    Verlag:  Paris : European Language Resources Association

    As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to... mehr

     

    As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to reuse existing search engine frameworks providing full text indices and allowing to query corpora by one of the corpus query languages (QLs) established and actively used in the corpus research community. For this purpose, we tested MTAS - an open source Lucene-based search engine for querying on text with multilevel annotations. We applied MTAS on three oral corpora stored in the TEI-based ISO standard for transcriptions of spoken language (ISO 24624:2016). These corpora differ from the corpus data that MTAS was developed for, because they include interactions with two and more speakers and are enriched, inter alia, with timeline-based annotations. In this contribution, we report our test results and address issues that arise when search frameworks originally developed for querying written corpora are being transferred into the field of spoken language.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Abfrage; Gesprochene Sprache; Text Encoding Initiative; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  24. User, who art thou? User profiling for oral corpus platforms

    This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg... mehr

     

    This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approach was complemented by qualitative interviews with selected users. We briefly introduce the corpus resources involved in the study in section 2. Section 3 describes the methods employed in the user studies. Section 4 summarizes results of the studies focusing on selected key topics. Section 5 attempts a generalization of these results to larger contexts.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Deutsch; Korpus; Gesprochene Sprache; Benutzerforschung
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  25. Datenbank für Gesprochenes Deutsch (DGD)
    Erschienen: 2016
    Verlag:  Duisburg : Nisaba

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Datenbank; Gesprochene Sprache
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess