Skip to content

Commit fb31995

Browse files
committed
Update collection style format in multiple dataset JSON files
1 parent 8e2ac63 commit fb31995

File tree

208 files changed

+208
-208
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

208 files changed

+208
-208
lines changed

README.md

+1-1

datasets/absa-hotels.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "web pages",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Around 15,562 Hotels' reviews were thoroughly reviewed by this research authors and a subset of 2,291 reviews were selected. The original dataset has been collected from well known Hotels' booking websites such as Booking.com, TripAdvisor.com.",
1414
"Volume": "24,028",
1515
"Unit": "sentences",

datasets/adi-17.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@
112112
"Dialect": "mixed",
113113
"Domain": "transcribed audio",
114114
"Form": "spoken",
115-
"Collection Style": "crawling and annotation(other)",
115+
"Collection Style": "crawling,annotation",
116116
"Description": "dialect identification of speech from YouTube to one of the 17 dialects",
117117
"Volume": "3,091",
118118
"Unit": "hours",

datasets/adi-5.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
"Dialect": "mixed",
4141
"Domain": "transcribed audio",
4242
"Form": "spoken",
43-
"Collection Style": "crawling and annotation(other)",
43+
"Collection Style": "crawling,annotation",
4444
"Description": "This will be divided across the five major Arabic dialects; Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA)",
4545
"Volume": "50",
4646
"Unit": "hours",

datasets/adpbc.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "This corpus contains the words and their dependency relation produced by performing some steps",
1414
"Volume": "16",
1515
"Unit": "documents",

datasets/adult_content_detection_on_arabic_twitter__analysis_and_experiments.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Adult Content Detection on Arabic Twitter",
1414
"Volume": "50,000",
1515
"Unit": "sentences",

datasets/ajgt.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Jordan",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Corpus consisted of 1,800 tweets annotated as positive and negative. Modern Standard Arabic (MSA) or Jordanian dialect.",
1414
"Volume": "1,800",
1515
"Unit": "sentences",

datasets/akec.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "news articles",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "The corpus consists in 160 arabic documents and their keyphrases.",
1414
"Volume": "160",
1515
"Unit": "documents",

datasets/alr__arabic_laptop_reviews_dataset.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "reviews",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Arabic Laptops Reviews (ALR) dataset focuses on laptops reviews written in Arabic",
1414
"Volume": "1,753",
1515
"Unit": "sentences",

datasets/amara.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(translation)",
12+
"Collection Style": "crawling,annotation,machine translation",
1313
"Description": "multilingually aligned for 20 languages, i.e. 20 monolingual corpora and 190 parallel corpora",
1414
"Volume": "154,301",
1515
"Unit": "sentences",

datasets/anercorp.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "news articles",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "collected from different resources ",
1414
"Volume": "316",
1515
"Unit": "documents",

datasets/anetac.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "English-Arabic named entity transliteration and classification dataset",
1414
"Volume": "79,924",
1515
"Unit": "sentences",

datasets/annotated_shami_corpus.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Lebanon",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Subsection of the Lebanese portion of the Shami Corpus annotated for spelling standardization (CODA), morphological segmentation and tagging, and spontaneous orthography taxonomy tagging.",
1414
"Volume": "10,000",
1515
"Unit": "tokens",

datasets/annotated_tweet_corpus_in_arabizi,_french_and_english.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "In total, 17,103 sequences were annotated from 585,163 tweets (196,374 in English, 254,748 in French and 134,041 in Arabizi), including the themes \u201cOthers\u201d and \u201cIncomprehensible\u201d. Among these sequences, 4,578 sequences having at least 20 tweets annotated with the 3 predefined themes (Hooliganism, Racism and Terrorism) were obtained, including 1,866 sequences with an opinion change. They are distributed as follows: 2,141 sequences in English (57,655 tweets), 1,942 sequences in French (48,854 tweets) and 495 sequences in Arabizi (21,216 tweets). A sub-corpus of 8,733 tweets (1,209 in English, 3,938 in French and 3,585 in Arabizi) annotated as \u201chateful\u201d, according to topic/opinion annotations and by selecting tweets that contained insults, is also provided. ",
1414
"Volume": "134,041",
1515
"Unit": "sentences",

datasets/ans_corpus___claim_verification.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
"Dialect": "Modern Standard Arabic",
1717
"Domain": "news articles",
1818
"Form": "text",
19-
"Collection Style": "crawling and annotation(other)",
19+
"Collection Style": "crawling,annotation",
2020
"Description": "corpus comes in two perspectives: a version consisting of 4,547 true and false claims and a version consisting of 3,786 pairs (claim, evidence).",
2121
"Volume": "4,547",
2222
"Unit": "sentences",

datasets/anti-social_behaviour_in_online_communication.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "a corpus of 15,050 labelled YouTube comments in Arabic",
1414
"Volume": "15,050",
1515
"Unit": "sentences",

datasets/aoc-aldi.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "commentary",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Comments to news articles with a continuous level of dialectness score between 0 and 1.",
1414
"Volume": "127,835",
1515
"Unit": "sentences",

datasets/aoc.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"Dialect": "mixed",
2323
"Domain": "news articles",
2424
"Form": "text",
25-
"Collection Style": "crawling and annotation(other)",
25+
"Collection Style": "crawling,annotation",
2626
"Description": "a 52M-word monolingual dataset rich in dialectal content",
2727
"Volume": "108,000",
2828
"Unit": "sentences",

datasets/apgc_v1_0__arabic_parallel_gender_corpus_v1_0.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "a corpus designed to support research on gender bias in natural language processing applications working on Arabic",
1414
"Volume": "12,000",
1515
"Unit": "sentences",

datasets/aqmar.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "wikipedia",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "This is a 74,000-token corpus of 28 Arabic Wikipedia articles hand-annotated for named entities.",
1414
"Volume": "74,000",
1515
"Unit": "tokens",

datasets/ar-embiddings__arabic_word_embeddings_for_sentiment_analysis.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "A large corpus for generating Arabic word embeddings from multiple sources such as news articles, consumer reviews, Quran text, and tweets. The embeddings are used to perform sentiment analysis in both Standard and Dialectal Arabic without relying on hand-crafted features. The embeddings are applied to several binary classifiers to detect subjectivity and sentiment in Arabic texts.",
1414
"Volume": "190,000,000",
1515
"Unit": "tokens",

datasets/arab-acquis.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(translation)",
12+
"Collection Style": "crawling,annotation,machine translation",
1313
"Description": "consists of over 12,000 sentences from the JRCAcquis (Acquis Communautaire) corpus ",
1414
"Volume": "12,000",
1515
"Unit": "sentences",

datasets/arab-esl.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Emoji (the popular digital pictograms) are sometimes seen as a new kind of artificial and universally usable and consistent writing code. In spite of their assumed universality, there is some evidence that the sense of an emoji, specifically in regard to sentiment, may change from language to language and culture to culture. This paper investigates whether contextual emoji sentiment analysis is consistent across Arabic and European languages. To conduct this investigation, we, first, created the Arabic emoji sentiment lexicon (Arab-ESL). Then, we exploited an existing European emoji sentiment lexicon to compare the sentiment conveyed in each of the two families of language and culture (Arabic and European). The results show that the pairwise correlation between the two lexicons is consistent for emoji that represent, for instance, hearts, facial expressions, and body language. However, for a subset of emoji (those that represent objects, nature, symbols, and some human activities), there are large differences in the sentiment conveyed. More interestingly, an extremely high level of inconsistency has been shown with food emoji.",
1414
"Volume": "1,034",
1515
"Unit": "tokens",

datasets/arabic-dialect_english_parallel_text.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
"Dialect": "Levant",
1717
"Domain": "other",
1818
"Form": "text",
19-
"Collection Style": "crawling and annotation(translation)",
19+
"Collection Style": "crawling,annotation,machine translation",
2020
"Description": "it uses crowdsourcing to cheaply and quickly build LevantineEnglish and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively.",
2121
"Volume": "1,500,000",
2222
"Unit": "tokens",

datasets/arabic-english_named_entities_dataset.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "news articles",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(translation)",
12+
"Collection Style": "crawling,annotation,machine translation",
1313
"Description": "Arabic-ENglish named entities dataset is created using DBpedia Linked datasets and parallel corpus. For annotating NE in monolingual English corpus we used Gate tool. Our approach is based on linked data entities by mapping them to Gate Gazetteers, and then constructing a type-oriented NE base covering person, Location and organization classes. The second task consists of the use of machine translation to translate these entities and then finally, generating our NE lexicon that encloses the list of Arabic entities that match to the English lists.",
1414
"Volume": "48,753",
1515
"Unit": "tokens",

datasets/arabic_dialects_dataset.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
"Dialect": "mixed",
4141
"Domain": "other",
4242
"Form": "text",
43-
"Collection Style": "crawling and annotation(other)",
43+
"Collection Style": "crawling,annotation",
4444
"Description": "Dataset of Arabic dialects for GULF, EGYPT, LEVANT, TONESIAN Arabic dialects in addition to MSA.",
4545
"Volume": "16,494",
4646
"Unit": "sentences",

datasets/arabic_flood_twitter_dataset.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "It includes 4,037 human-labelled Arabic Twitter messages for four high-risk flood events that occurred in 2018",
1414
"Volume": "4,037",
1515
"Unit": "sentences",

datasets/arabic_hate_speech_2022_shared_task.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "largest Arabic dataset for offensive, fine-grained hate speech, vulgar and violence content",
1414
"Volume": "12,698",
1515
"Unit": "sentences",

datasets/arabic_keyphrase_dataset.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "news articles",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "A dataset in Arabic language for automatic keyphrase extraction algorithms",
1414
"Volume": "400",
1515
"Unit": "documents",

datasets/arabic_named_entities.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "we have extracted\r\napproximately 45,000 Arabic NE",
1414
"Volume": "45,000",
1515
"Unit": "tokens",

datasets/arabic_named_entity_gazetteer.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "wikipedia",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "A gazetteer of entities curated from Wikipedia.",
1414
"Volume": "68,355",
1515
"Unit": "tokens",

datasets/arabic_news_dataset_about_hajj.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Classical Arabic",
1010
"Domain": "news articles",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "more than 2k articles about Hajj ",
1414
"Volume": "2,000",
1515
"Unit": "documents",

datasets/arabic_news_tweets.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "This dataset is a relatively great size collection of Arabic news tweets that were collected from an official and verified users in Twitter. All news that is collected from the most popular and official users in Saudi Arabia belongs to Saudi Arabia news. All data that is gathered was retrieved using specific time period and collected all news in that time. To the best of our knowledge, this dataset is the first Arabic news data collection that does not specify by keywords and belongs to Saudi Arabia. This news dataset can be valuable for diverse tasks in NLP, such as text classification and automated verification system. The dataset has been categorized into 5 different news classes which are general news, regions news, sport news, economic news, and quality life news. In this data article, 89,179 original tweets have presented and fully labeled into related categories.",
1414
"Volume": "89,179",
1515
"Unit": "sentences",

datasets/arabic_osact4___offensive_language_detection.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "OSACT4 Shared Task on Offensive Language Detection",
1414
"Volume": "8,000",
1515
"Unit": "sentences",

datasets/arabic_osact5___arabic_hate_speech.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Fine-Grained Hate Speech Detection on Arabic Twitter",
1414
"Volume": "10,157",
1515
"Unit": "sentences",

datasets/arabic_pos_dialect.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
"Dialect": "mixed",
3535
"Domain": "social media",
3636
"Form": "text",
37-
"Collection Style": "crawling and annotation(other)",
37+
"Collection Style": "crawling,annotation",
3838
"Description": "includes tweets in Egyptian, Levantine, Gulf, and Maghrebi, with 350 tweets for each dialect with appropriate train/test/development splits for 5-fold cross validation",
3939
"Volume": "1,400",
4040
"Unit": "sentences",

datasets/arabic_punctuation_dataset.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "books",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(translation)",
12+
"Collection Style": "crawling,annotation,machine translation",
1313
"Description": "This is a curated dataset, specifically designed to facilitate the study of punctuation. It has undergone rigorous manual annotation and verification on the basis of sentence structure, with sentence boundaries clearly marked. ",
1414
"Volume": "12,183,000",
1515
"Unit": "sentences",

datasets/arabic_rc_datasets.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "Arabic Reading Comprehension Benchmarks Created Semiautomatically",
1414
"Volume": "2,862",
1515
"Unit": "sentences",

datasets/arabic_satire_dataset.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Classical Arabic",
1010
"Domain": "other",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "500 Arabic news and 500 Arabic satire articles ",
1414
"Volume": "1,000",
1515
"Unit": "sentences",

datasets/arabic_sentiment_lexicons.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(translation)",
12+
"Collection Style": "crawling,annotation,machine translation",
1313
"Description": " by using distant supervision techniques on Arabic tweets, and by translating English sentiment lexicons into Arabic using a freely available statistical machine translation system",
1414
"Volume": "176,364",
1515
"Unit": "tokens",

datasets/arabic_sentiment_twitter_corpus.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "A Sentiment Analysis dataset. No extra information is provided regarding the dialects nor the collection methodology",
1414
"Volume": "58,000",
1515
"Unit": "sentences",

datasets/arabic_spam_and_ham_tweets.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "The dataset contains 13241 records. Each record represents a tweet. The tweets are labeled either Ham or Spam. Ham means non-spam tweet. There are 1924 Spam tweets and 11299 Ham tweets. The tweets are unique i.e. there are no repeated tweets records.",
1414
"Volume": "13,241",
1515
"Unit": "sentences",

datasets/arabic_tweets_about_infectious_diseases.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "mixed",
1010
"Domain": "social media",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "This file contains a dataset of 1266 tweets by two Arabic native speakers into five types of sources: academic, media, government, health professional, and public.",
1414
"Volume": "1,266",
1515
"Unit": "sentences",

datasets/arabic_wikireading_and_kaiflematha.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"Dialect": "Modern Standard Arabic",
1010
"Domain": "wikipedia",
1111
"Form": "text",
12-
"Collection Style": "crawling and annotation(other)",
12+
"Collection Style": "crawling,annotation",
1313
"Description": "high quality and large-scale Arabic reading comprehension datasets: Arabic WikiReading and KaifLematha with around +100 K instances.",
1414
"Volume": "100,000",
1515
"Unit": "documents",

0 commit comments

Comments
 (0)