Skip to content

Commit 2142663

Browse files
committed
chore(process): process dufo&label for training data to do SSL.
close #2, #4. docs(readme): update readme docs(env): setup a new data process env to avoid pakcage conflict. data(av2): if no gt, then only data itself will be write.
1 parent d8198f2 commit 2142663

7 files changed

Lines changed: 262 additions & 11 deletions

File tree

README.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,7 @@ SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
88

99
![](assets/docs/seflow_arch.png)
1010

11-
2024/07/16 17:18: Most of codes already uploaded and tested. You can to try training directly by [downloading](https://zenodo.org/records/13744999) demo data or pretrained weight for evaluation.
12-
The process script will be public when the paper is published officially.
11+
2024/09/26 16:24: All codes already uploaded and tested. You can to try training directly by [downloading](https://zenodo.org/records/13744999) demo data or pretrained weight for evaluation.
1312

1413
Pre-trained weights for models are available in [Zenodo](https://zenodo.org/records/13744999) link. Check usage in [2. Evaluation](#2-evaluation) or [3. Visualization](#3-visualization).
1514

@@ -88,8 +87,8 @@ unzip demo_data.zip -p /home/kin/data/av2
8887

8988
#### Prepare raw data
9089

91-
Checking more information (step for downloading raw data, storage size, #frame etc) in [dataprocess/README.md](dataprocess/README.md). Extract all data to unified h5 format.
92-
[Runtime: Normally need 10 mins finished run following commands totally in my desktop, 45 mins for the cluster I used]
90+
Checking more information (step for downloading raw data, storage size, #frame etc) in [dataprocess/README.md](dataprocess/README.md). Extract all data to unified `.h5` format.
91+
[Runtime: Normally need 45 mins finished run following commands totally in setup mentioned in our paper]
9392
```bash
9493
python dataprocess/extract_av2.py --av2_type sensor --data_mode train --argo_dir /home/kin/data/av2 --output_dir /home/kin/data/av2/preprocess_v2
9594
python dataprocess/extract_av2.py --av2_type sensor --data_mode val --mask_dir /home/kin/data/av2/3d_scene_flow
@@ -172,11 +171,14 @@ https://github.com/user-attachments/assets/f031d1a2-2d2f-4947-a01f-834ed1c146e6
172171
## Cite & Acknowledgements
173172

174173
```
175-
@article{zhang2024seflow,
174+
@inproceedings{zhang2024seflow,
176175
author={Zhang, Qingwen and Yang, Yi and Li, Peizheng and Andersson, Olov and Jensfelt, Patric},
177-
title={SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving},
178-
journal={arXiv preprint arXiv:2407.01702},
179-
year={2024}
176+
title={{SeFlow}: A Self-Supervised Scene Flow Method in Autonomous Driving},
177+
booktitle={European Conference on Computer Vision (ECCV)},
178+
year={2024},
179+
pages={353–369},
180+
organization={Springer},
181+
doi={10.1007/978-3-031-73232-4_20},
180182
}
181183
@inproceedings{zhang2024deflow,
182184
author={Zhang, Qingwen and Yang, Yi and Fang, Heng and Geng, Ruoyu and Jensfelt, Patric},

assets/cuda/chamfer3D/README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,76 @@ Chamfer Distance Cal time: 1.814 ms
7575
loss: tensor(0.1710, device='cuda:0', grad_fn=<AddBackward0>)
7676
```
7777

78+
## Mics
79+
80+
81+
### Note for CUDA ChamferDis
82+
83+
主要是 两个月前写的 已经看不懂了;然后问题原因是因为 总是缺0.0003的精度(精度强迫症患者)
84+
然后就以为是自己写错了 后面发现是因为block的这种并行化 线程大小的不同对CUDA的浮点运算会有所不同,所以导致精度差距是有一点的 如果介意的话 可以使用pytorch3d的版本(也就是速度慢4倍左右 从15ms 到 80ms)
85+
86+
这里主要重申一遍 shared memory在这里的用法:
87+
1. 首先我们每个点都会分开走到 `int tid = blockIdx.x * blockDim.x + threadIdx.x;` 也就是全局索引,注意这个每个点都分开了 因为pc0每个点和pc1的临近点 和 其他的pc0点无关
88+
2. 然后走到每个点内部 就是__shared__ 我们首先建立了 pc1的share,但是因为共享内存有限,所以每次只保存THREADS_PER_BLOCK
89+
3. 保存 THREADS_PER_BLOCK 也是每个线程做的 我们在对比距离前 运行了 __syncthreads(); 确保 THREADS_PER_BLOCK 个点的 pc1 已经到了
90+
4. 接着 我们在 `num_elems` 这一部分的数据内进行对比,同步best
91+
5. 最后传给 全局这个点的 `result`
92+
93+
需要注意的是 这种极致的并行化 会对精度产生一定的影响,但是如果你感兴趣 `#define THREADS_PER_BLOCK 256` 可以调整这个,对每个block设置不同的threads 会对精度有影响(当然 影响是 在 gt: 0.1710 但cuda计算会是 0.1711 - 0.1713之间)
94+
95+
以下为chatgpt:
96+
精度差异的原因之一可能是由于在不同的线程块大小下,浮点运算的顺序发生了改变。由于浮点运算是不结合的(即(a + b) + c 可能不等于 a + (b + c)),因此改变运算的顺序可能会导致轻微的结果差异。
97+
98+
这种类型的精度变化在GPU计算中是非常常见的,特别是在使用较大的数据集和进行大量的浮点运算时。要完全消除这种差异是非常困难的,因为即使是非常微小的实现细节变化(例如改变线程块大小、更改循环的结构、甚至是不同的GPU硬件或不同的CUDA版本)都可能导致浮点运算顺序的微小变化。
99+
100+
如果需要确保结果的一致性,可以考虑以下方法:
101+
102+
1. 固定线程块大小:选择一个固定的线程块大小,并始终使用它。
103+
104+
2. 双精度浮点数(Double Precision):使用double类型代替float,可以提高精度,但代价是更高的内存使用和可能的性能下降。
105+
106+
3. 数值稳定的算法:尽量使用数值稳定的算法,尽管这在GPU上实现起来可能比较复杂且效率较低。
107+
108+
4. 减少并行化程度:通过减少并行化程度来减少由于不同线程执行顺序引起的差异,但这通常会牺牲性能。
109+
110+
111+
复制代码部分如下:
112+
```cpp
113+
114+
for (int i = 0; i < pc1_n; i += THREADS_PER_BLOCK) {
115+
// Copy a block of pc1 to shared memory
116+
int pc1_idx = i + threadIdx.x;
117+
if (pc1_idx < pc1_n) {
118+
shared_pc1[threadIdx.x * 3 + 0] = pc1_xyz[pc1_idx * 3 + 0];
119+
shared_pc1[threadIdx.x * 3 + 1] = pc1_xyz[pc1_idx * 3 + 1];
120+
shared_pc1[threadIdx.x * 3 + 2] = pc1_xyz[pc1_idx * 3 + 2];
121+
}
122+
123+
__syncthreads();
124+
125+
// Compute the distance between pc0[tid] and the points in shared_pc1
126+
// NOTE(Qingwen): since after two months I forgot what I did here, I write some notes for future me
127+
// 0. One reason for the difference in precision may be due to the changing order of floating point operations at different thread block sizes.
128+
// But I think it's fine we lose 0.0001 precision for speed up cal time 4x
129+
// 1. since we use shared to store pc1, here Every BLOCK will have new shared_pc1 start from 0
130+
// 2. we use THREADS_PER_BLOCK to loop pc1, so we need to check if the last block is not full
131+
// 3. Based on the CUDA document, the __syncthreads() is not necessary here, but we keep it for safety
132+
// 4. After running once, we go for next block of pc1, and find the best in that batch
133+
134+
int num_elems = min(THREADS_PER_BLOCK, pc1_n - i);
135+
for (int j = 0; j < num_elems; j++) {
136+
float x1 = shared_pc1[j * 3 + 0];
137+
float y1 = shared_pc1[j * 3 + 1];
138+
float z1 = shared_pc1[j * 3 + 2];
139+
float d = (x1 - x0) * (x1 - x0) + (y1 - y0) * (y1 - y0) + (z1 - z0) * (z1 - z0);
140+
if (d < best) {
141+
best = d;
142+
best_i = j + i;
143+
}
144+
}
145+
__syncthreads();
146+
}
147+
```
78148

79149
## Other issues
80150
In cluster when build cuda things, you may occur problem:

dataprocess/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@ We've updated the process dataset for:
1212
- [x] Waymo: check [here](#waymo-dataset). The process script was involved from [SeFlow](https://github.com/KTH-RPL/SeFlow).
1313
- [ ] nuScenes: done coding, public after review. Will be involved later by another paper.
1414

15+
If you want to use all datasets above, there is a specific process environment in [envprocess.yml](../envprocess.yml) to install all the necessary packages. As Waymo package have different configuration and conflict with the main environment. Setup through the following command:
16+
17+
```bash
18+
conda env create -f envprocess.yml
19+
conda activate dataprocess
20+
```
21+
1522
## Download
1623

1724
### Argoverse 2.0

dataprocess/extract_av2.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,8 @@ def create_group_data(group, pc, gm, pose, flow_0to1=None, flow_valid=None, flow
208208
for file in os.listdir(data_dir / log_id / "sensors/lidar")
209209
if file.endswith('.feather')])
210210

211+
gt_flow_flag = False if not (data_dir / log_id / "annotations.feather").exists() else True
212+
211213
# if n is not None:
212214
# iter_bar = tqdm(zip(timestamps, timestamps[1:]), leave=False,
213215
# total=len(timestamps) - 1, position=n,
@@ -222,7 +224,7 @@ def create_group_data(group, pc, gm, pose, flow_0to1=None, flow_valid=None, flow
222224
if pc0.shape[0] < 256:
223225
print(f'{log_id}/{ts0} has less than 256 points, skip this scenarios. Please check the data if needed.')
224226
break
225-
if cnt == len(timestamps) - 1:
227+
if cnt == len(timestamps) - 1 or not gt_flow_flag:
226228
create_group_data(group, pc0, is_ground_0.astype(np.bool_), pose0.transform_matrix.astype(np.float32))
227229
else:
228230
ts1 = timestamps[cnt + 1]
@@ -269,7 +271,7 @@ def main(
269271
argo_dir: str = "/home/kin/data/av2",
270272
output_dir: str ="/home/kin/data/av2/preprocess",
271273
av2_type: str = "sensor",
272-
data_mode: str = "test",
274+
data_mode: str = "val",
273275
mask_dir: str = "/home/kin/data/av2/3d_scene_flow",
274276
nproc: int = (multiprocessing.cpu_count() - 1),
275277
only_index: bool = False,

environment.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,12 @@ dependencies:
2929
- open3d==0.18.0
3030
- dztimer
3131
- av2==0.2.1
32+
- dufomap==1.0.0
3233

3334
# Reason about the version fixed:
3435
# setuptools==68.5.1: https://github.com/aws-neuron/aws-neuron-sdk/issues/893
3536
# mkl==2024.0.0: https://github.com/pytorch/pytorch/issues/123097#issue-2218541307
3637
# av2==0.2.1: in case other version deleted some functions.
3738
# lightning==2.0.1: https://stackoverflow.com/questions/76647518/how-to-fix-error-cannot-import-name-modelmetaclass-from-pydantic-main
3839
# open3d==0.18.0: because 0.17.0 have bug on set the view json file
40+
# dufomap==1.0.0: in case later updating may not compatible with the code.

envprocess.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: dataprocess
2+
channels:
3+
- conda-forge
4+
- pytorch
5+
dependencies:
6+
- python=3.8
7+
- pytorch::pytorch=2.0.0
8+
- pytorch::torchvision
9+
- numba
10+
- numpy==1.22
11+
- pandas
12+
- pip
13+
- scipy
14+
- tqdm
15+
- scikit-learn
16+
- fire
17+
- pip:
18+
- nuscenes-devkit
19+
- av2==0.2.1
20+
- waymo-open-dataset-tf-2.11.0==1.5.0
21+
- open3d==0.16.0
22+
- linefit
23+
- dztimer
24+
- dufomap==1.0.0
25+
- evalai
26+
27+
# Reason about the version fixed:
28+
# numpy==1.22: package conflicts, need numpy higher or same 1.22

process.py

Lines changed: 141 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,148 @@
33
# Copyright (C) 2023-now, RPL, KTH Royal Institute of Technology
44
# Author: Qingwen Zhang (https://kin-zhang.github.io/)
55
#
6+
# This file is part of SeFlow (https://github.com/KTH-RPL/SeFlow).
7+
# If you find this repo helpful, please cite the respective publication as
8+
# listed on the above website.
69
#
710
# Description: run dufomap on the dataset we preprocessed for afterward ssl training.
811
# it's only needed for ssl train but not inference.
912
# Goal to segment dynamic and static point roughly.
10-
"""
13+
"""
14+
15+
from pathlib import Path
16+
from tqdm import tqdm
17+
import numpy as np
18+
import fire, time, h5py, os
19+
from hdbscan import HDBSCAN
20+
21+
from src.utils.mics import HDF5Data, transform_to_array
22+
from dufomap import dufomap
23+
24+
MIN_AXIS_RANGE = 2 # HARD CODED: remove ego vehicle points
25+
MAX_AXIS_RANGE = 50 # HARD CODED: remove far away points
26+
27+
def run_cluster(
28+
data_dir: str ="/home/kin/data/av2/preprocess/sensor/train",
29+
scene_range: list = [0, 1],
30+
interval: int = 1, # useless here, just for the same interface args
31+
overwrite: bool = False,
32+
):
33+
data_path = Path(data_dir)
34+
dataset = HDF5Data(data_path)
35+
all_scene_ids = list(dataset.scene_id_bounds.keys())
36+
for scene_in_data_index, scene_id in enumerate(all_scene_ids):
37+
start_time = time.time()
38+
# NOTE (Qingwen): so the scene id range is [start, end)
39+
if scene_range[0]!= -1 and scene_range[-1]!= -1 and (scene_in_data_index < scene_range[0] or scene_in_data_index >= scene_range[1]):
40+
continue
41+
bounds = dataset.scene_id_bounds[scene_id]
42+
flag_exist_label = True
43+
with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
44+
for ii in range(bounds["min_index"], bounds["max_index"]+1):
45+
key = str(dataset[ii]['timestamp'])
46+
if 'label' not in f[key]:
47+
flag_exist_label = False
48+
break
49+
if flag_exist_label and not overwrite:
50+
print(f"==> Scene {scene_id} has plus label, skip.")
51+
continue
52+
53+
hdb = HDBSCAN(min_cluster_size=20, cluster_selection_epsilon=0.7)
54+
for i in tqdm(range(bounds["min_index"], bounds["max_index"]+1), desc=f"Start Plus Cluster: {scene_in_data_index}/{len(all_scene_ids)}", ncols=80):
55+
data = dataset[i]
56+
pc0 = data['pc0'][:,:3]
57+
if "dufo_label" not in data:
58+
print(f"Warning: {scene_id} {data['timestamp']} has no dufo_label, will be skipped. Better to rerun dufomap again in this scene.")
59+
continue
60+
61+
cluster_label = np.zeros(pc0.shape[0], dtype= np.int16)
62+
hdb.fit(pc0[data["dufo_label"]==1])
63+
# NOTE(Qingwen): since -1 will be assigned if no cluster. We set it to 0.
64+
cluster_label[data["dufo_label"]==1] = hdb.labels_ + 1
65+
66+
# save labels
67+
timestamp = data['timestamp']
68+
key = str(timestamp)
69+
with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
70+
if 'label' in f[key]:
71+
# print(f"Warning: {scene_id} {timestamp} has label, will be overwritten.")
72+
del f[key]['label']
73+
f[key].create_dataset('label', data=np.array(cluster_label).astype(np.int16))
74+
print(f"==> Scene {scene_id} finished, used: {(time.time() - start_time)/60:.2f} mins")
75+
print(f"Data inside {str(data_path)} finished. Check the result with vis() function if you want to visualize them.")
76+
77+
def run_dufo(
78+
data_dir: str ="/home/kin/data/av2/preprocess/sensor/train",
79+
scene_range: list = [0, 1],
80+
interval: int = 1, # interval frames to run dufomap
81+
overwrite: bool = False,
82+
):
83+
data_path = Path(data_dir)
84+
dataset = HDF5Data(data_path)
85+
all_scene_ids = list(dataset.scene_id_bounds.keys())
86+
for scene_in_data_index, scene_id in enumerate(all_scene_ids):
87+
start_time = time.time()
88+
# NOTE (Qingwen): so the scene id range is [start, end)
89+
if scene_range[0]!= -1 and scene_range[-1]!= -1 and (scene_in_data_index < scene_range[0] or scene_in_data_index >= scene_range[1]):
90+
continue
91+
bounds = dataset.scene_id_bounds[scene_id]
92+
flag_has_dufo_label = True
93+
with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
94+
for ii in range(bounds["min_index"], bounds["max_index"]+1):
95+
key = str(dataset[ii]['timestamp'])
96+
if "dufo_label" not in f[key]:
97+
flag_has_dufo_label = False
98+
break
99+
if flag_has_dufo_label and not overwrite:
100+
print(f"==> Scene {scene_id} has dufo_label, skip.")
101+
continue
102+
103+
mydufo = dufomap(0.2, 0.2, 1, num_threads=12) # resolution, d_s, d_p, hit_extension
104+
mydufo.setCluster(0, 20, 0.2) # depth=0, min_points=20, max_dist=0.2
105+
106+
print(f"==> Scene {scene_id} start, data path: {data_path}")
107+
for i in tqdm(range(bounds["min_index"], bounds["max_index"]+1), desc=f"Dufo run: {scene_in_data_index}/{len(all_scene_ids)}", ncols=80):
108+
if interval != 1 and i % interval != 0 and (i + interval//2 < bounds["max_index"] or i - interval//2 > bounds["min_index"]):
109+
continue
110+
data = dataset[i]
111+
assert data['scene_id'] == scene_id, f"Check the data, scene_id {scene_id} is not consistent in {i}th data in {scene_in_data_index}th scene."
112+
# HARD CODED: remove points outside the range
113+
norm_pc0 = np.linalg.norm(data['pc0'][:, :3], axis=1)
114+
range_mask = (
115+
(norm_pc0>MIN_AXIS_RANGE) &
116+
(norm_pc0<MAX_AXIS_RANGE)
117+
)
118+
pose_array = transform_to_array(data['pose0'])
119+
mydufo.run(data['pc0'][range_mask], pose_array, cloud_transform = True)
120+
121+
# finished integrate, start segment, needed since we have map.label inside dufo
122+
mydufo.oncePropagateCluster(if_cluster = True, if_propagate=True)
123+
for i in tqdm(range(bounds["min_index"], bounds["max_index"]+1), desc=f"Start Segment: {scene_in_data_index}/{len(all_scene_ids)}", ncols=80):
124+
data = dataset[i]
125+
pc0 = data['pc0']
126+
gm0 = data['gm0']
127+
pose_array = transform_to_array(data['pose0'])
128+
dufo_label = np.array(mydufo.segment(pc0, pose_array, cloud_transform = True))
129+
dufo_labels = np.zeros(pc0.shape[0], dtype= np.uint8)
130+
dufo_labels[~gm0] = dufo_label[~gm0]
131+
132+
# save labels
133+
timestamp = data['timestamp']
134+
key = str(timestamp)
135+
with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
136+
if "dufo_label" in f[key]:
137+
# print(f"Warning: {scene_id} {timestamp} has label, will be overwritten.")
138+
del f[key]["dufo_label"]
139+
f[key].create_dataset("dufo_label", data=np.array(dufo_labels).astype(np.uint8))
140+
print(f"==> Scene {scene_id} finished, used: {(time.time() - start_time)/60:.2f} mins")
141+
print(f"Data inside {str(data_path)} finished. Check the result with vis() function if you want to visualize them.")
142+
143+
if __name__ == '__main__':
144+
start_time = time.time()
145+
# step 1: run dufomap
146+
fire.Fire(run_dufo)
147+
# step 2: run cluster on dufolabel
148+
fire.Fire(run_cluster)
149+
150+
print(f"\nTime used: {(time.time() - start_time)/60:.2f} mins")

0 commit comments

Comments
 (0)