Skip to content
Change the repository type filter

All

    Repositories list

    • Tools for managing datasets for governance and training.
      HTML
      Apache License 2.0
      47901383Updated Mar 16, 2026Mar 16, 2026
    • ShadesofBias

      Public
      Evaluation for Shades of Bias in Text
      HTML
      2920Updated Apr 23, 2025Apr 23, 2025
    • biomedical

      Public
      Tools for curating biomedical training data for large-scale language modeling
      Python
      11949616416Updated Dec 9, 2024Dec 9, 2024
    • xmtf

      Public
      Crosslingual Generalization through Multitask Finetuning
      Jupyter Notebook
      Apache License 2.0
      43536120Updated Sep 22, 2024Sep 22, 2024
    • petals

      Public
      🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
      Python
      MIT License
      60110k9220Updated Sep 7, 2024Sep 7, 2024
    • Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
      Shell
      Other
      1011k137Updated Jul 29, 2024Jul 29, 2024
    • Ongoing research training transformer language models at scale, including: BERT & GPT-2
      Python
      Other
      2261.4k7647Updated Mar 20, 2024Mar 20, 2024
    • BLOOM+1: Adapting BLOOM model to support a new unseen language
      Python
      Apache License 2.0
      1774136Updated Mar 2, 2024Mar 2, 2024
    • promptsource

      Public
      Toolkit for creating, sharing and using natural language prompts.
      Python
      Apache License 2.0
      3793k1132Updated Oct 23, 2023Oct 23, 2023
    • Framework for BLOOM probing
      Python
      9900Updated Oct 17, 2023Oct 17, 2023
    • Python
      Apache License 2.0
      3439945Updated Jul 25, 2023Jul 25, 2023
    • metadata

      Public
      Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.
      Python
      Apache License 2.0
      11302513Updated Jun 12, 2023Jun 12, 2023
    • A framework for few-shot evaluation of autoregressive language models.
      Python
      MIT License
      3.2k10578Updated May 9, 2023May 9, 2023
    • Code used for sourcing and cleaning the BigScience ROOTS corpus
      Jupyter Notebook
      Apache License 2.0
      43318100Updated Mar 20, 2023Mar 20, 2023
    • A list of BigScience publications
      TeX
      Apache License 2.0
      1310Updated Mar 13, 2023Mar 13, 2023
    • Python
      Apache License 2.0
      17200Updated Dec 5, 2022Dec 5, 2022
    • t-zero

      Public
      Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
      Python
      Apache License 2.0
      5346482Updated Nov 5, 2022Nov 5, 2022
    • A repository for `codecarbon` logs.
      Jupyter Notebook
      51320Updated Nov 3, 2022Nov 3, 2022
    • PII Processing code to detect and remediate PII in BigScience datasets. Reference implementation for the PII Hackathon
      Python
      Other
      6971Updated Oct 6, 2022Oct 6, 2022
    • lam

      Public
      Libraries, Archives and Museums (LAM)
      Apache License 2.0
      889340Updated Oct 4, 2022Oct 4, 2022
    • Apache License 2.0
      52600Updated Jul 11, 2022Jul 11, 2022
    • A repo for running model shrinking experiments
      Python
      41000Updated Jun 21, 2022Jun 21, 2022
    • BigScience working group on language models for historical texts
      Jupyter Notebook
      7802Updated May 10, 2022May 10, 2022
    • Code and Data for Evaluation WG
      Python
      Other
      2442419Updated May 4, 2022May 4, 2022
    • Scripts to prepare catalogue data
      Jupyter Notebook
      Apache License 2.0
      1853Updated Apr 25, 2022Apr 25, 2022
    • 45112Updated Apr 22, 2022Apr 22, 2022
    • Tools for evaluating model robustness and consistency
      Python
      Other
      2202Updated Mar 9, 2022Mar 9, 2022
    • 11100Updated Feb 27, 2022Feb 27, 2022
    • Python
      Apache License 2.0
      21112Updated Feb 16, 2022Feb 16, 2022
    • Generate statistics over datasets used in the context of BS
      Makefile
      1200Updated Feb 1, 2022Feb 1, 2022
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.