Skip to content

Commit fd2a999

Browse files
Update readme for visualization
1 parent fc7a967 commit fd2a999

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

ac_dc/visualization/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Use this visualization tool online at https://huggingface.co/spaces/huggingface/text-data-filtering.
44

5-
However, it is faster (and can handle in practice up to three times more documents) by running the code on your computer.
5+
However, by running the code on your computer, it is faster, it can handle in practice up to three times more documents, and it works for every language.
66

77
1) Use get_data_for_visualization.py to get the json gathering examples with their computed statistics for the language you chose.
88
It uses the streaming mode of the Datasets library, so no need to download the dataset, but you have to download the fasttext model (for the language identification) and the kenlm / sentencepiece models (for the perplexity).

0 commit comments

Comments
 (0)