Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
976c683
updated config files, added array size parameter for cluster execution
tpall Nov 20, 2025
ef13eed
updated nextflow.config
tpall Nov 20, 2025
d9a3bc6
Swap output assignments for rRNA and tRNA collections
tpall Nov 21, 2025
a087cf4
Merge branch 'dev' of https://github.com/WrightonLabCSU/DRAM into dev
tpall Nov 21, 2025
f5697b2
Merge branch 'dev' of https://github.com/tpall/DRAM into dev
tpall Nov 21, 2025
63ea268
Refactor distill script and configuration for improved clarity and fu…
tpall Nov 25, 2025
7414979
Refactor input and output path definitions for consistency in the SUM…
tpall Nov 26, 2025
a77c29e
Fix conditional check for gene columns in genome summary export to pr…
tpall Nov 26, 2025
e8b0e95
Refactor channel usage for consistency across workflows and improve r…
tpall Nov 26, 2025
4418739
Update SUMMARIZE module to use parameterized fasta column for grouping
tpall Nov 27, 2025
cd3b7ac
Fix closure in QC workflow
tpall Nov 28, 2025
ed054bd
Fix closure in DB_SEARCH workflow
tpall Nov 28, 2025
d39ff14
Updated combine_annotations.py to fix binwise summary. TODO: getting…
tpall Dec 1, 2025
64ab39e
Add QC:COLLECT_RNA to array pattern
tpall Dec 18, 2025
1ee2aea
Merge branch 'dev' of https://github.com/WrightonLabCSU/DRAM into dev
tpall Dec 23, 2025
2c93e2a
Merge branch 'dev' of https://github.com/WrightonLabCSU/DRAM into dev
tpall Dec 23, 2025
702666f
Merge branch 'dev' of https://github.com/tpall/DRAM into dev
tpall Dec 23, 2025
69b23f2
Merge branch 'dev' of https://github.com/WrightonLabCSU/DRAM into dev
tpall Mar 3, 2026
23b1a18
feat: accept gzip-compressed fasta input
tpall Apr 24, 2026
3059c89
fix: register array_size in schema so it validates
tpall Apr 24, 2026
d00f417
chore: remove unused trees subsystem and DRAM-v1 legacy setup scripts
tpall Apr 24, 2026
8c7dcf8
chore: remove DRAM-v1 db_description_builder and db_utils
tpall Apr 24, 2026
721fcaa
fix(db_search): correct DB_channel_SETUP case mismatch
tpall Apr 25, 2026
f1a60ec
fix(db_search): correct formattedOutputchannels case mismatch
tpall Apr 25, 2026
051d70b
fix(annotate): drop MMSEQS_INDEX publishDir to save disk
tpall Apr 26, 2026
dffe5ff
fix(distill): call .keys() on dict, not list, in check_columns log
tpall Apr 27, 2026
9e8e899
fix(distill): use polars to read rrna/trna/quast TSVs
tpall Apr 27, 2026
5b951c1
fix(distill): convert rrna section in make_genome_stats to polars
tpall Apr 27, 2026
00993c7
fix(distill): write genome_stats.tsv via polars write_csv
tpall Apr 27, 2026
84dc9db
fix(distill): rewrite make_genome_summary on polars + rule_parser
tpall Apr 27, 2026
0e44754
chore(distill): drop pandas-era dead code
tpall Apr 27, 2026
b1d597f
feat(dramv): vendor amg_database.tsv from v1
tpall Apr 27, 2026
01f7628
feat(dramv): vendor v1 reference sets into utils.dramv_constants
tpall Apr 27, 2026
0984ec2
feat(dramv): add dramv_flags.py — compute amg_flags + is_transposon
tpall Apr 27, 2026
34d0320
feat(dramv): add DRAMV_FLAGS process and wire into ANNOTATE
tpall Apr 27, 2026
dab18df
feat(dramv): add --amg_only mode to distill.py
tpall Apr 27, 2026
f255dde
feat(dramv): viral-mode pipeline defaults and SUMMARIZE wiring
tpall Apr 27, 2026
0cfb561
test(dramv): unit tests for compute_flags + read_scaffold_lengths
tpall Apr 27, 2026
a51d1f3
chore(dramv): register use_dramv and amg_length_from_end in nf schema
tpall Apr 27, 2026
c40a961
fix(summarize): stageAs distinct names for optional rrna/trna/quast i…
tpall Apr 27, 2026
9724960
fix(summarize): drop ext.args groupby_column override
tpall Apr 27, 2026
5a6ff33
docs(dramv): README viral-mode example + CHANGELOG entry
tpall Apr 27, 2026
413ffa3
chore(dramv): publish annotations_with_flags.tsv under ANNOTATE/
tpall Apr 28, 2026
92a4af7
feat(dramv): auto-enable use_pfam under use_dramv
tpall Apr 28, 2026
9e8bf61
Merge pull request #1 from tpall/feature/dramv-phase1
tpall Apr 28, 2026
e20ce2f
docs(readme): surface DRAM-v Phase 1 viral mode in intro and Quick Links
tpall Apr 28, 2026
2424c2b
feat(input): accept single fasta file as --input_fasta
tpall Apr 28, 2026
c44616f
fix(mmseqs_index): bump resources to process_small + enable retry
tpall Apr 28, 2026
168b70b
fix(hmmsearch): allow OOM retry and bump KOFam to process_medium
tpall Apr 29, 2026
0e5e1c8
feat(hmmsearch): chunk KOFam/VOG queries for parallel array execution
tpall Apr 29, 2026
dc83519
fix(combine_annotations): allow OOM retry and bump to process_big
tpall Apr 29, 2026
3531e39
fix(config): inline manifest refs for V2 config parser
tpall May 5, 2026
b2175d6
fix(config): make CONSTANTS work under V2 config parser
tpall May 5, 2026
3f873e3
fix(config): inline groupby_column to bypass cross-file params ref
tpall May 5, 2026
0a0c9a3
fix(config): enable nf-schema lenientMode for CLI boolean flags
tpall May 5, 2026
8333d3a
fix(config): disable SLURM job arrays by default
tpall May 5, 2026
b97fda4
chore(gitignore): ignore misc/ for local reference material
tpall May 6, 2026
3405048
feat(dramv): flag essential viral function rows per Martin 2025
tpall May 6, 2026
bec99db
fix(dramv): correct malformed identifiers in amg_database.tsv
tpall May 6, 2026
b3037c4
feat(dramv): add N flag for essential viral function genes
tpall May 6, 2026
e3fb1ca
feat(distill): exclude N-flagged genes from --amg_only output
tpall May 6, 2026
5839d34
docs(readme): document N flag and updated --amg_only filter
tpall May 6, 2026
5c256d4
fix(validation): coerce integer CLI params before nf-schema check
tpall May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,6 @@ nextflow-local.config

# scratch folder
scratch/

# local-only reference material (papers, notes)
misc/
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,23 @@

All notable changes to this project will be documented in this file.

## Unreleased (feature/dramv-phase1)

### Features

- **DRAM-v Phase 1: viral mode for geNomad+CheckV catalogs.** A new `--use_dramv` flag adds AMG flagging and viral-flavoured distillation on top of the bacterial annotate pipeline, no VirSorter affi-contigs required.
- New `DRAMV_FLAGS` process runs after `COMBINE_ANNOTATIONS` and appends `amg_flags` (M/K/E/A/P/T/F/B per DRAM v1 conventions, less the `V` flag pending VOGdb integration) and `is_transposon` columns to `raw-annotations.tsv`.
- `distill.py --amg_only` filters annotations to AMG candidates (`M` set, `A`/`P`/`T` clear), restricts the distillate form to `potential_amg=TRUE` rows, and collapses them into a single `AMG` Excel sheet.
- Viral mode forces `groupby_column=scaffold`, skips QUAST and the rRNA/tRNA collectors (none of which align with per-vMAG aggregation), and runs SUMMARIZE in `--amg_only` mode.
- Bundled assets: `bin/assets/amg_database.tsv` (ported verbatim from DRAM v1) and `bin/utils/dramv_constants.py` (TRANSPOSON_PFAMS, CELL_ENTRY_CAZYS, VIRAL_PEPTIDASES_MEROPS).
- Pytest unit suite at `tests/unit/test_dramv_flags.py` covering individual flag firing, B-flag scaffold boundaries, sub-3-gene scaffolds, K-forces-M, E (verified AMG), F window, and FASTA parsing.

### Bug Fixes

- `bin/distill.py`: rewrote the pandas-era summarisation path on top of polars + `rule_parser.evaluate_rules_on_anno`, dropped the broken pandas `write_summarized_genomes_to_xlsx` shadow, fixed `bin_taxnomy` typo, swapped `pd.read_csv` for `pl.read_csv` for rrna/trna/quast, and converted the rrna section of `make_genome_stats` to polars.
- `modules/local/distill/distill.nf`: stage rrna / trna / quast inputs under distinct names so the same `default_sheet` dummy in viral mode no longer triggers a Nextflow input-name collision.
- `conf/modules.config`: dropped the `task.ext.args = "--groupby_column …"` SUMMARIZE override that double-defined the flag and silently won the click race.

## 2.0.0-beta24 - 2026-02-03

[3659fda](https://github.com/WrightonLabCSU/DRAM/commit/3659fdaa0f9779108840e3bbf97c6d196b37a7d3)...[32d0527](https://github.com/WrightonLabCSU/DRAM/commit/32d05274be6eaeaed48de6bb5a047bd67f21fea1)
Expand Down
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@

DRAM v2 (Distilled and Refined Annotation of Metabolism Version 2) is a tool for annotating metagenomic and genomic assembled data (e.g. scaffolds or contigs) or called genes (e.g. nuclotide or amino acid format). DRAM annotates MAGs using [KEGG](https://www.kegg.jp/) (if provided by the user), [UniRef90](https://www.uniprot.org/), [PFAM](https://pfam.xfam.org/), [dbCAN](http://bcb.unl.edu/dbCAN2/), [RefSeq viral](https://www.ncbi.nlm.nih.gov/genome/viruses/), [VOGDB](http://vogdb.org/) and the [MEROPS](https://www.ebi.ac.uk/merops/) peptidase database as well as custom user databases.

Viral catalogs from a typical geNomad → CheckV pipeline are also supported via **DRAM-v Phase 1 viral mode** (`--use_dramv true`): per-vMAG (per-scaffold) AMG flagging (M/K/E/A/P/T/F/B per the v1 conventions) and a viral-flavoured distillate, no VirSorter affi-contigs file required. See the "Viral mode" example below.

DRAM is run in four stages:
1) Gene Calling Prodogal - genes are called on user provided scaffolds or contigs
2) Gene Annotation - genes are annotated with a set of user defined databases
Expand All @@ -26,6 +28,7 @@ For more detail on DRAM and how DRAM v2 works please see our DRAM products:
- [Usage Examples](https://dramit.readthedocs.io/en/latest/usage.html)
- [Parameter API]([#command-line-options](https://dramit.readthedocs.io/en/latest/params_doc.html))
- [Rules API]([#nextflow-tips-and-tricks](https://dramit.readthedocs.io/en/latest/rules_parser.html))
- [Viral mode (DRAM-v Phase 1)](#example-usage) — see example 8 below

## Example Usage

Expand Down Expand Up @@ -70,6 +73,19 @@ nextflow run -bg WrightonLabCSU/DRAM \
-profile singularity,full_mode
```

8) **Viral mode (DRAM-v Phase 1) — AMG flags on geNomad+CheckV catalogs:**
```bash
nextflow run WrightonLabCSU/DRAM \
--input_fasta <path/to/viral_catalog_dir> \
--outdir <output> \
--call --annotate --summarize --qc \
--use_kofam --use_dbcan --use_merops \
--use_dramv true \
-profile singularity
```
`--use_dramv` runs after `COMBINE_ANNOTATIONS` and adds two columns to `raw-annotations.tsv`:
`amg_flags` (string of M/K/E/A/P/T/F/B per the DRAM v1 conventions, no `V` since VOGdb is not yet wired, plus a non-v1 `N` for essential viral function — see below) and `is_transposon` (bool). The distillate is filtered to strict-AMG candidates (rows with `M` and without `A`/`P`/`T`/`N`) and emitted as a single `AMG` sheet in `metabolism_summary.xlsx`, with one count column per scaffold (vMAG). The `N` letter marks genes whose IDs hit `bin/assets/amg_database.tsv` rows where `essential_viral_function=TRUE`, per [Martin et al. 2025](https://doi.org/10.1038/s41564-025-02095-4) — paper-cautioned genes (DsrC, QueC/QueF, folA/folB/folK, RNR, mazG, pur*, etc.) that are likely essential for viral processes rather than auxiliary metabolism, and so are excluded from the strict-AMG sheet. They still appear in the full `raw-annotations.tsv` with `N` in `amg_flags` so users can review them. Viral mode forces `groupby_column=scaffold` and skips QUAST and rRNA/tRNA collection — those don't make sense at the per-scaffold granularity. Inputs that come from a typical geNomad → CheckV pipeline (a single multi-fasta of trimmed viral contigs) work out of the box; no VirSorter affi-contigs file is required.

## Nextflow Tips and Tricks

The `-resume` option in Nextflow DSL2 allows you to efficiently manage and modify your workflow runs:
Expand Down
86 changes: 0 additions & 86 deletions assets/internal/generate_sql_database.py

This file was deleted.

Loading