Skip to content

Commit 01ae5bd

Browse files
committed
fix(mkdir): pl hydra with ddp, will have runtime.dir not follow default. HARD CODE now.
check following issues in hydra to sync: facebookresearch/hydra#2070 docs(readme): fix #2 about hyperlink missing in dataprocess commands.
1 parent 76f25ab commit 01ae5bd

5 files changed

Lines changed: 18 additions & 11 deletions

File tree

1_train.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
# Copyright (C) 2023-now, RPL, KTH Royal Institute of Technology
44
# Author: Qingwen Zhang (https://kin-zhang.github.io/)
55
#
6-
# This file is part of DeFlow (https://github.com/KTH-RPL/DeFlow).
6+
# This file is part of DeFlow (https://github.com/KTH-RPL/DeFlow) and
7+
# SeFlow (https://github.com/KTH-RPL/SeFlow) projects.
78
# If you find this repo helpful, please cite the respective publication as
89
# listed on the above website.
910
@@ -47,17 +48,20 @@ def main(cfg):
4748
collate_fn=collate_fn_pad,
4849
pin_memory=True)
4950

50-
5151
# count gpus, overwrite gpus
5252
cfg.gpus = torch.cuda.device_count() if torch.cuda.is_available() else 0
5353

54-
# only for logging on folder name.
54+
output_dir = HydraConfig.get().runtime.output_dir
55+
# overwrite logging folder name for SSL.
5556
if cfg.loss_fn == 'seflowLoss':
5657
cfg.output = cfg.output.replace(cfg.model.name, "seflow")
58+
output_dir = output_dir.replace(cfg.model.name, "seflow")
5759
method_name = "seflow"
5860
else:
5961
method_name = cfg.model.name
60-
output_dir = HydraConfig.get().runtime.output_dir + f"/{cfg.output}"
62+
63+
# FIXME: hydra output_dir with ddp run will mkdir in the parent folder. Looks like PL and Hydra trying to fix in lib.
64+
# print(f"Output Directory: {output_dir} in gpu rank: {torch.cuda.current_device()}")
6165
Path(os.path.join(output_dir, "checkpoints")).mkdir(parents=True, exist_ok=True)
6266

6367
cfg = DictConfig(OmegaConf.to_container(cfg, resolve=True))

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ Note: Prepare raw data and process train data only needed run once for the task.
8080
### Data Preparation
8181

8282
Check [dataprocess/README.md](dataprocess/README.md#argoverse-20) for downloading tips for the raw Argoverse 2 dataset. Or maybe you want to have the **mini processed dataset** to try the code quickly, We directly provide one scene inside `train` and `val`. It already converted to `.h5` format and processed with the label data.
83-
You can download it from [Zenodo](https://zenodo.org/record/12751363) and extract it to the data folder. And then you can skip following steps and directly run the [training script](#train-the-model).
83+
You can download it from [Zenodo](https://zenodo.org/records/12751363/files/demo_data.zip) and extract it to the data folder. And then you can skip following steps and directly run the [training script](#train-the-model).
8484

8585
```bash
8686
wget https://zenodo.org/record/12751363/files/demo_data.zip
@@ -89,7 +89,8 @@ unzip demo_data.zip -p /home/kin/data/av2
8989

9090
#### Prepare raw data
9191

92-
Extract all data to unified h5 format. [Runtime: Normally need 10 mins finished run following commands totally in my desktop, 45 mins for the cluster I used]
92+
Checking more information (download raw data etc) in [dataprocess/README.md](dataprocess/README.md). Extract all data to unified h5 format.
93+
[Runtime: Normally need 10 mins finished run following commands totally in my desktop, 45 mins for the cluster I used]
9394
```bash
9495
python dataprocess/extract_av2.py --av2_type sensor --data_mode train --argo_dir /home/kin/data/av2 --output_dir /home/kin/data/av2/preprocess_v2
9596
python dataprocess/extract_av2.py --av2_type sensor --data_mode val --mask_dir /home/kin/data/av2/3d_scene_flow
@@ -122,8 +123,10 @@ python 1_train.py model=fastflow3d lr=2e-4 epochs=20 batch_size=16 loss_fn=ff3dL
122123
python 1_train.py model=deflow lr=2e-4 epochs=20 batch_size=16 loss_fn=deflowLoss
123124
```
124125

125-
Note: You may found the different settings in the paper that is all methods are enlarge learning rate to 2e-4 and decrease the epochs to 20 for faster converge (Through analysis, we also found it had better performance).
126-
However, we kept the setting on lr=2e-6 and 50 epochs in the paper experiment for the fair comparison with ZeroFlow where we directly use their provided weights.
126+
> [!NOTE]
127+
> You may found the different settings in the paper that is all methods are enlarge learning rate to 2e-4 and decrease the epochs to 20 for faster converge and better performance.
128+
> However, we kept the setting on lr=2e-6 and 50 epochs in (SeFlow & DeFlow) paper experiments for the fair comparison with ZeroFlow where we directly use their provided weights.
129+
> We suggest afterward researchers or users to use the setting here (larger lr and smaller epoch) for faster converge and better performance.
127130
128131
## 2. Evaluation
129132

conf/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ gradient_clip_val: 5.0
2828
# optimizer ==> Adam
2929
lr: 2e-6
3030
loss_fn: seflowLoss # choices: [ff3dLoss, zeroflowLoss, deflowLoss, seflowLoss]
31-
add_seloss:
31+
add_seloss: # {chamfer_dis: 1.0, static_flow_loss: 1.0, dynamic_chamfer_dis: 1.0, cluster_based_pc0pc1: 1.0}
3232

3333
# log settings
3434
seed: 42069

conf/hydra/default.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
run:
2-
dir: logs/wandb
2+
dir: logs/jobs/${output}/${now:%m-%d-%H-%M}

dataprocess/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/sensor/test/*" sensor/te
3434
s5cmd --no-sign-request cp "s3://argoverse/tasks/3d_scene_flow/zips/*" .
3535
```
3636

37-
Then to quickly pre-process the data, we can [read more detail](../preprocess/README.md) on how to generate the pre-processed data for training and evaluation. This will take around 2 hour for the whole dataset (train & val) based on how powerful your CPU is.
37+
Then to quickly pre-process the data, we can [read these commands](#process) on how to generate the pre-processed data for training and evaluation. This will take around 0.5-2 hour for the whole dataset (train & val) based on how powerful your CPU is.
3838

3939
More [self-supervised data in AV2 LiDAR only](https://www.argoverse.org/av2.html#lidar-link), note: It **does not** include **imagery or 3D annotations**. The dataset is designed to support research into self-supervised learning in the lidar domain, as well as point cloud forecasting.
4040
```bash

0 commit comments

Comments
 (0)