BSC-LT
/

salamandra-7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jsaizant commited on 5 days ago

Commit

dd93102

·

verified ·

1 Parent(s): a9b425c

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -590,8 +590,8 @@ especially if the content originates from less-regulated sources or user-generat
 **How was the data collected?**
 This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
-- Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
-- Domain-specific or language-specific raw crawls, always respecting robots.txt (p.e. Spanish Crawling).
 - Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
   (p.e. CATalog).
@@ -644,7 +644,7 @@ The original raw data was not kept.
 **Is the software that was used to preprocess/clean/label the data available? If so, please provide a link or other access point.**
-Yes, the preprocessing and filtering software is open-sourced. The [CURATE](https://github.com/langtech-bsc/CURATE) pipeline was used for Spanish Crawling and CATalog,
 and the [Ungoliant](https://github.com/oscar-project/ungoliant) pipeline was used for the OSCAR project.
 #### Uses

 **How was the data collected?**
 This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
+- Web-sourced datasets with some preprocessing available under permissive license.
+- Domain-specific or language-specific raw crawls, always respecting robots.txt.
 - Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
   (p.e. CATalog).
 **Is the software that was used to preprocess/clean/label the data available? If so, please provide a link or other access point.**
+Yes, the preprocessing and filtering software is open-sourced. The [CURATE](https://github.com/langtech-bsc/CURATE) pipeline was used for CATalog and other curated sources,
 and the [Ungoliant](https://github.com/oscar-project/ungoliant) pipeline was used for the OSCAR project.
 #### Uses