Skip to content

Commit fc7a967

Browse files
Update readme for visualization
1 parent 8e69e3a commit fc7a967

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

ac_dc/visualization/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Visualization tool
22

3+
Use this visualization tool online at https://huggingface.co/spaces/huggingface/text-data-filtering.
4+
5+
However, it is faster (and can handle in practice up to three times more documents) by running the code on your computer.
6+
37
1) Use get_data_for_visualization.py to get the json gathering examples with their computed statistics for the language you chose.
48
It uses the streaming mode of the Datasets library, so no need to download the dataset, but you have to download the fasttext model (for the language identification) and the kenlm / sentencepiece models (for the perplexity).
59

0 commit comments

Comments
 (0)