DSpace Repository

Corpus Carolina v1.0 Ada

Show simple item record

dc.contributor.author Finger, Marcelo
dc.contributor.author Paixão de Sousa, Maria Clara
dc.contributor.author Namiuti, Cristiane
dc.contributor.author Martins do Monte, Vanessa
dc.coverage.spatial Brazil
dc.coverage.temporal 1970-2022
dc.date.accessioned 2022-04-04T13:23:00Z
dc.date.available 2022-04-04T13:23:00Z
dc.date.issued 2022-04-04T10:22:59Z
dc.identifier.citation Marcelo Finger et al. Carolina: a General Corpus of Contemporary Brazilian Portuguese with Provenance and Typology Information. Submitted for publication.
dc.identifier.citation Mariana L. Sturzeneker et al. Carolina's Methodology: building a large corpus with provenance and typology information. Digital Humanities and Natural Language Processing Workshop – PROPOR 2022.
dc.identifier.uri https://www5.usp.br/
dc.identifier.uri http://repositorio.uspdigital.usp.br/handle/item/355
dc.description Carolina is a general corpus of contemporary Brazilian Portuguese with information on origin and typology. Carolina is an open corpus for Linguistics and Artificial Intelligence with a robust volume of texts of varied typology in contemporary Brazilian Portuguese (1970-2021). The first version of the corpus – 1.0 Ada – totals 653,354,884 million tokens, and is available in open access, for free download for research purposes, since March 8, 2022. Lincensing information may vary from text to text. Please check information at each text/file TEI-xml heading. This version of the corpus contains seven typologies: 1. datasets and other corpora 2. legislative branch 3. social media 4. wikis 5. judicial branch 6. public domain works 7. university domains This collection: datasets and wikis
dc.description.sponsorship Fapesp
dc.format zip file
dc.publisher Center for Artificial Intelligence (C4AI) http://c4ai.inova.usp.br
dc.subject Contemporary Brazilian Portuguese texts
dc.subject Corpus Carolina
dc.title Corpus Carolina v1.0 Ada
dc.type Dataset
dc.description.sponsorshipId 2019/07665-4


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics