Skip to content

Commit 4c15ecf

Browse files
committed
2 parents 84184c8 + fd2a999 commit 4c15ecf

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

ac_dc/visualization/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Use this visualization tool online at https://huggingface.co/spaces/huggingface/text-data-filtering.
44

5-
However, it is faster (and can handle in practice up to three times more documents) by running the code on your computer.
5+
However, by running the code on your computer, it is faster, it can handle in practice up to three times more documents, and it works for every language.
66

77
1) Use get_data_for_visualization.py to get the json gathering examples with their computed statistics for the language you chose.
88
It uses the streaming mode of the Datasets library, so no need to download the dataset, but you have to download the fasttext model (for the language identification) and the kenlm / sentencepiece models (for the perplexity).

0 commit comments

Comments
 (0)