chore(process): process dufo&label for training data to do SSL.

Kin-Zhang · Kin-Zhang · commit 2142663b2016 · 2024-09-29T13:13:28.000+02:00
close #2, #4. docs(readme): update readme docs(env): setup a new data process env to avoid pakcage conflict. data(av2): if no gt, then only data itself will be write.
diff --git a/README.md b/README.md
@@ -8,8 +8,7 @@ SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
 
 ![](assets/docs/seflow_arch.png)
 
-2024/07/16 17:18: Most of codes already uploaded and tested. You can to try training directly by [downloading](https://zenodo.org/records/13744999) demo data or pretrained weight for evaluation. 
-The process script will be public when the paper is published officially. 
+2024/09/26 16:24: All codes already uploaded and tested. You can to try training directly by [downloading](https://zenodo.org/records/13744999) demo data or pretrained weight for evaluation. 
 
 Pre-trained weights for models are available in [Zenodo](https://zenodo.org/records/13744999) link. Check usage in [2. Evaluation](#2-evaluation) or [3. Visualization](#3-visualization).
 
@@ -88,8 +87,8 @@ unzip demo_data.zip -p /home/kin/data/av2
 
 #### Prepare raw data 
 
-Checking more information (step for downloading raw data, storage size, #frame etc) in [dataprocess/README.md](dataprocess/README.md). Extract all data to unified h5 format. 
-[Runtime: Normally need 10 mins finished run following commands totally in my desktop, 45 mins for the cluster I used]
+Checking more information (step for downloading raw data, storage size, #frame etc) in [dataprocess/README.md](dataprocess/README.md). Extract all data to unified `.h5` format. 
+[Runtime: Normally need 45 mins finished run following commands totally in setup mentioned in our paper]
 ```bash
 python dataprocess/extract_av2.py --av2_type sensor --data_mode train --argo_dir /home/kin/data/av2 --output_dir /home/kin/data/av2/preprocess_v2
 python dataprocess/extract_av2.py --av2_type sensor --data_mode val --mask_dir /home/kin/data/av2/3d_scene_flow
@@ -172,11 +171,14 @@ https://github.com/user-attachments/assets/f031d1a2-2d2f-4947-a01f-834ed1c146e6
 ## Cite & Acknowledgements
 
 ```
-@article{zhang2024seflow,
+@inproceedings{zhang2024seflow,
   author={Zhang, Qingwen and Yang, Yi and Li, Peizheng and Andersson, Olov and Jensfelt, Patric},
-  title={SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving},
-  journal={arXiv preprint arXiv:2407.01702},
-  year={2024}
+  title={{SeFlow}: A Self-Supervised Scene Flow Method in Autonomous Driving},
+  booktitle={European Conference on Computer Vision (ECCV)},
+  year={2024},
+  pages={353–369},
+  organization={Springer},
+  doi={10.1007/978-3-031-73232-4_20},
 }
 @inproceedings{zhang2024deflow,
   author={Zhang, Qingwen and Yang, Yi and Fang, Heng and Geng, Ruoyu and Jensfelt, Patric},
diff --git a/assets/cuda/chamfer3D/README.md b/assets/cuda/chamfer3D/README.md
@@ -75,6 +75,76 @@ Chamfer Distance Cal time: 1.814 ms
 loss:  tensor(0.1710, device='cuda:0', grad_fn=<AddBackward0>)
 ```
 
+## Mics
+
+
+### Note for CUDA ChamferDis
+
+主要是 两个月前写的 已经看不懂了；然后问题原因是因为 总是缺0.0003的精度（精度强迫症患者）
+然后就以为是自己写错了 后面发现是因为block的这种并行化 线程大小的不同对CUDA的浮点运算会有所不同，所以导致精度差距是有一点的 如果介意的话 可以使用pytorch3d的版本（也就是速度慢4倍左右 从15ms 到 80ms）
+
+这里主要重申一遍 shared memory在这里的用法：
+1. 首先我们每个点都会分开走到 `int tid = blockIdx.x * blockDim.x + threadIdx.x;` 也就是全局索引，注意这个每个点都分开了 因为pc0每个点和pc1的临近点 和 其他的pc0点无关
+2. 然后走到每个点内部 就是__shared__ 我们首先建立了 pc1的share，但是因为共享内存有限，所以每次只保存THREADS_PER_BLOCK
+3. 保存 THREADS_PER_BLOCK 也是每个线程做的 我们在对比距离前 运行了 __syncthreads(); 确保 THREADS_PER_BLOCK 个点的 pc1 已经到了
+4. 接着 我们在 `num_elems` 这一部分的数据内进行对比，同步best
+5. 最后传给 全局这个点的 `result`
+
+需要注意的是 这种极致的并行化 会对精度产生一定的影响，但是如果你感兴趣 `#define THREADS_PER_BLOCK 256` 可以调整这个，对每个block设置不同的threads 会对精度有影响（当然 影响是 在 gt: 0.1710 但cuda计算会是 0.1711 - 0.1713之间）
+
+以下为chatgpt：
+精度差异的原因之一可能是由于在不同的线程块大小下，浮点运算的顺序发生了改变。由于浮点运算是不结合的（即(a + b) + c 可能不等于 a + (b + c)），因此改变运算的顺序可能会导致轻微的结果差异。
+
+这种类型的精度变化在GPU计算中是非常常见的，特别是在使用较大的数据集和进行大量的浮点运算时。要完全消除这种差异是非常困难的，因为即使是非常微小的实现细节变化（例如改变线程块大小、更改循环的结构、甚至是不同的GPU硬件或不同的CUDA版本）都可能导致浮点运算顺序的微小变化。
+
+如果需要确保结果的一致性，可以考虑以下方法：
+
+1. 固定线程块大小：选择一个固定的线程块大小，并始终使用它。
+
+2. 双精度浮点数（Double Precision）：使用double类型代替float，可以提高精度，但代价是更高的内存使用和可能的性能下降。
+
+3. 数值稳定的算法：尽量使用数值稳定的算法，尽管这在GPU上实现起来可能比较复杂且效率较低。
+
+4. 减少并行化程度：通过减少并行化程度来减少由于不同线程执行顺序引起的差异，但这通常会牺牲性能。
+
+
+复制代码部分如下：
+```cpp
+
+for (int i = 0; i < pc1_n; i += THREADS_PER_BLOCK) {
+	// Copy a block of pc1 to shared memory
+	int pc1_idx = i + threadIdx.x;
+	if (pc1_idx < pc1_n) {
+		shared_pc1[threadIdx.x * 3 + 0] = pc1_xyz[pc1_idx * 3 + 0];
+		shared_pc1[threadIdx.x * 3 + 1] = pc1_xyz[pc1_idx * 3 + 1];
+		shared_pc1[threadIdx.x * 3 + 2] = pc1_xyz[pc1_idx * 3 + 2];
+	}
+
+	__syncthreads();
+
+	// Compute the distance between pc0[tid] and the points in shared_pc1
+	// NOTE(Qingwen):  since after two months I forgot what I did here, I write some notes for future me
+	// 0. One reason for the difference in precision may be due to the changing order of floating point operations at different thread block sizes.
+	//    But I think it's fine we lose 0.0001 precision for speed up cal time 4x
+	// 1. since we use shared to store pc1, here Every BLOCK will have new shared_pc1 start from 0
+	// 2. we use THREADS_PER_BLOCK to loop pc1, so we need to check if the last block is not full
+	// 3. Based on the CUDA document, the __syncthreads() is not necessary here, but we keep it for safety
+	// 4. After running once, we go for next block of pc1, and find the best in that batch
+	
+	int num_elems = min(THREADS_PER_BLOCK, pc1_n - i);
+	for (int j = 0; j < num_elems; j++) {
+		float x1 = shared_pc1[j * 3 + 0];
+		float y1 = shared_pc1[j * 3 + 1];
+		float z1 = shared_pc1[j * 3 + 2];
+		float d = (x1 - x0) * (x1 - x0) + (y1 - y0) * (y1 - y0) + (z1 - z0) * (z1 - z0);
+		if (d < best) {
+			best = d;
+			best_i = j + i;
+		}
+	}
+	__syncthreads();
+}
+```
 
 ## Other issues
 In cluster when build cuda things, you may occur problem:
diff --git a/dataprocess/README.md b/dataprocess/README.md
@@ -12,6 +12,13 @@ We've updated the process dataset for:
 - [x] Waymo: check [here](#waymo-dataset). The process script was involved from [SeFlow](https://github.com/KTH-RPL/SeFlow).
 - [ ] nuScenes: done coding, public after review. Will be involved later by another paper.
 
+If you want to use all datasets above, there is a specific process environment in [envprocess.yml](../envprocess.yml) to install all the necessary packages. As Waymo package have different configuration and conflict with the main environment. Setup through the following command:
+
+```bash
+conda env create -f envprocess.yml
+conda activate dataprocess
+```
+
 ## Download
 
 ### Argoverse 2.0
diff --git a/dataprocess/extract_av2.py b/dataprocess/extract_av2.py
@@ -208,6 +208,8 @@ def create_group_data(group, pc, gm, pose, flow_0to1=None, flow_valid=None, flow
                         for file in os.listdir(data_dir / log_id / "sensors/lidar")
                         if file.endswith('.feather')])
 
+    gt_flow_flag = False if not (data_dir / log_id / "annotations.feather").exists() else True
+
     # if n is not None:
     #     iter_bar = tqdm(zip(timestamps, timestamps[1:]), leave=False,
     #                      total=len(timestamps) - 1, position=n,
@@ -222,7 +224,7 @@ def create_group_data(group, pc, gm, pose, flow_0to1=None, flow_valid=None, flow
             if pc0.shape[0] < 256:
                 print(f'{log_id}/{ts0} has less than 256 points, skip this scenarios. Please check the data if needed.')
                 break
-            if cnt == len(timestamps) - 1:
+            if cnt == len(timestamps) - 1 or not gt_flow_flag:
                 create_group_data(group, pc0, is_ground_0.astype(np.bool_), pose0.transform_matrix.astype(np.float32))
             else:
                 ts1 = timestamps[cnt + 1]
@@ -269,7 +271,7 @@ def main(
     argo_dir: str = "/home/kin/data/av2",
     output_dir: str ="/home/kin/data/av2/preprocess",
     av2_type: str = "sensor",
-    data_mode: str = "test",
+    data_mode: str = "val",
     mask_dir: str = "/home/kin/data/av2/3d_scene_flow",
     nproc: int = (multiprocessing.cpu_count() - 1),
     only_index: bool = False,
diff --git a/environment.yaml b/environment.yaml
@@ -29,10 +29,12 @@ dependencies:
     - open3d==0.18.0
     - dztimer
     - av2==0.2.1
+    - dufomap==1.0.0
 
 # Reason about the version fixed:
 # setuptools==68.5.1: https://github.com/aws-neuron/aws-neuron-sdk/issues/893
 # mkl==2024.0.0: https://github.com/pytorch/pytorch/issues/123097#issue-2218541307
 # av2==0.2.1: in case other version deleted some functions.
 # lightning==2.0.1: https://stackoverflow.com/questions/76647518/how-to-fix-error-cannot-import-name-modelmetaclass-from-pydantic-main
 # open3d==0.18.0: because 0.17.0 have bug on set the view json file
+# dufomap==1.0.0: in case later updating may not compatible with the code.
diff --git a/envprocess.yaml b/envprocess.yaml
@@ -0,0 +1,28 @@
+name: dataprocess
+channels:
+  - conda-forge
+  - pytorch
+dependencies:
+  - python=3.8
+  - pytorch::pytorch=2.0.0
+  - pytorch::torchvision
+  - numba
+  - numpy==1.22
+  - pandas
+  - pip
+  - scipy
+  - tqdm
+  - scikit-learn
+  - fire
+  - pip:
+    - nuscenes-devkit
+    - av2==0.2.1
+    - waymo-open-dataset-tf-2.11.0==1.5.0
+    - open3d==0.16.0
+    - linefit
+    - dztimer
+    - dufomap==1.0.0
+    - evalai
+
+# Reason about the version fixed:
+# numpy==1.22: package conflicts, need numpy higher or same 1.22
diff --git a/process.py b/process.py
@@ -3,8 +3,148 @@
 # Copyright (C) 2023-now, RPL, KTH Royal Institute of Technology
 # Author: Qingwen Zhang  (https://kin-zhang.github.io/)
 #
+# This file is part of SeFlow (https://github.com/KTH-RPL/SeFlow).
+# If you find this repo helpful, please cite the respective publication as 
+# listed on the above website.
 # 
 # Description: run dufomap on the dataset we preprocessed for afterward ssl training.
 #              it's only needed for ssl train but not inference.
 #              Goal to segment dynamic and static point roughly.
-"""
+"""
+
+from pathlib import Path
+from tqdm import tqdm
+import numpy as np
+import fire, time, h5py, os
+from hdbscan import HDBSCAN
+
+from src.utils.mics import HDF5Data, transform_to_array
+from dufomap import dufomap
+
+MIN_AXIS_RANGE = 2 # HARD CODED: remove ego vehicle points
+MAX_AXIS_RANGE = 50 # HARD CODED: remove far away points
+
+def run_cluster(
+    data_dir: str ="/home/kin/data/av2/preprocess/sensor/train",
+    scene_range: list = [0, 1],
+    interval: int = 1, # useless here, just for the same interface args
+    overwrite: bool = False,
+):
+    data_path = Path(data_dir)
+    dataset = HDF5Data(data_path)
+    all_scene_ids = list(dataset.scene_id_bounds.keys())
+    for scene_in_data_index, scene_id in enumerate(all_scene_ids):
+        start_time = time.time()
+        # NOTE (Qingwen): so the scene id range is [start, end)
+        if scene_range[0]!= -1 and scene_range[-1]!= -1 and (scene_in_data_index < scene_range[0] or scene_in_data_index >= scene_range[1]):
+            continue
+        bounds = dataset.scene_id_bounds[scene_id]
+        flag_exist_label = True
+        with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
+            for ii in range(bounds["min_index"], bounds["max_index"]+1):
+                key = str(dataset[ii]['timestamp'])
+                if 'label' not in f[key]:
+                    flag_exist_label = False
+                    break
+        if flag_exist_label and not overwrite:
+            print(f"==> Scene {scene_id} has plus label, skip.")
+            continue
+        
+        hdb = HDBSCAN(min_cluster_size=20, cluster_selection_epsilon=0.7)
+        for i in tqdm(range(bounds["min_index"], bounds["max_index"]+1), desc=f"Start Plus Cluster: {scene_in_data_index}/{len(all_scene_ids)}", ncols=80):
+            data = dataset[i]
+            pc0 = data['pc0'][:,:3]
+            if "dufo_label" not in data:
+                print(f"Warning: {scene_id} {data['timestamp']} has no dufo_label, will be skipped. Better to rerun dufomap again in this scene.")
+                continue
+
+            cluster_label = np.zeros(pc0.shape[0], dtype= np.int16)
+            hdb.fit(pc0[data["dufo_label"]==1])
+            # NOTE(Qingwen): since -1 will be assigned if no cluster. We set it to 0.
+            cluster_label[data["dufo_label"]==1] = hdb.labels_ + 1 
+
+            # save labels
+            timestamp = data['timestamp']
+            key = str(timestamp)
+            with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
+                if 'label' in f[key]:
+                    # print(f"Warning: {scene_id} {timestamp} has label, will be overwritten.")
+                    del f[key]['label']
+                f[key].create_dataset('label', data=np.array(cluster_label).astype(np.int16))
+        print(f"==> Scene {scene_id} finished, used: {(time.time() - start_time)/60:.2f} mins")
+    print(f"Data inside {str(data_path)} finished. Check the result with vis() function if you want to visualize them.")
+
+def run_dufo(
+    data_dir: str ="/home/kin/data/av2/preprocess/sensor/train",
+    scene_range: list = [0, 1],
+    interval: int = 1, # interval frames to run dufomap
+    overwrite: bool = False,
+):
+    data_path = Path(data_dir)
+    dataset = HDF5Data(data_path)
+    all_scene_ids = list(dataset.scene_id_bounds.keys())
+    for scene_in_data_index, scene_id in enumerate(all_scene_ids):
+        start_time = time.time()
+        # NOTE (Qingwen): so the scene id range is [start, end)
+        if scene_range[0]!= -1 and scene_range[-1]!= -1 and (scene_in_data_index < scene_range[0] or scene_in_data_index >= scene_range[1]):
+            continue
+        bounds = dataset.scene_id_bounds[scene_id]
+        flag_has_dufo_label = True
+        with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
+            for ii in range(bounds["min_index"], bounds["max_index"]+1):
+                key = str(dataset[ii]['timestamp'])
+                if "dufo_label" not in f[key]:
+                    flag_has_dufo_label = False
+                    break
+        if flag_has_dufo_label and not overwrite:
+            print(f"==> Scene {scene_id} has dufo_label, skip.")
+            continue
+
+        mydufo = dufomap(0.2, 0.2, 1, num_threads=12) # resolution, d_s, d_p, hit_extension
+        mydufo.setCluster(0, 20, 0.2) # depth=0, min_points=20, max_dist=0.2
+
+        print(f"==> Scene {scene_id} start, data path: {data_path}")
+        for i in tqdm(range(bounds["min_index"], bounds["max_index"]+1), desc=f"Dufo run: {scene_in_data_index}/{len(all_scene_ids)}", ncols=80):
+            if interval != 1 and i % interval != 0 and (i + interval//2 < bounds["max_index"] or i - interval//2 > bounds["min_index"]):
+                continue
+            data = dataset[i]
+            assert data['scene_id'] == scene_id, f"Check the data, scene_id {scene_id} is not consistent in {i}th data in {scene_in_data_index}th scene."
+            # HARD CODED: remove points outside the range
+            norm_pc0 = np.linalg.norm(data['pc0'][:, :3], axis=1)
+            range_mask = (
+                    (norm_pc0>MIN_AXIS_RANGE) & 
+                    (norm_pc0<MAX_AXIS_RANGE)
+            )
+            pose_array = transform_to_array(data['pose0'])
+            mydufo.run(data['pc0'][range_mask], pose_array, cloud_transform = True)
+
+        # finished integrate, start segment, needed since we have map.label inside dufo
+        mydufo.oncePropagateCluster(if_cluster = True, if_propagate=True)
+        for i in tqdm(range(bounds["min_index"], bounds["max_index"]+1), desc=f"Start Segment: {scene_in_data_index}/{len(all_scene_ids)}", ncols=80):
+            data = dataset[i]
+            pc0 = data['pc0']
+            gm0 = data['gm0']
+            pose_array = transform_to_array(data['pose0'])
+            dufo_label = np.array(mydufo.segment(pc0, pose_array, cloud_transform = True))
+            dufo_labels = np.zeros(pc0.shape[0], dtype= np.uint8)
+            dufo_labels[~gm0] = dufo_label[~gm0]
+
+            # save labels
+            timestamp = data['timestamp']
+            key = str(timestamp)
+            with h5py.File(os.path.join(data_path, f'{scene_id}.h5'), 'r+') as f:
+                if "dufo_label" in f[key]:
+                    # print(f"Warning: {scene_id} {timestamp} has label, will be overwritten.")
+                    del f[key]["dufo_label"]
+                f[key].create_dataset("dufo_label", data=np.array(dufo_labels).astype(np.uint8))
+        print(f"==> Scene {scene_id} finished, used: {(time.time() - start_time)/60:.2f} mins")
+    print(f"Data inside {str(data_path)} finished. Check the result with vis() function if you want to visualize them.")
+
+if __name__ == '__main__':
+    start_time = time.time()
+    # step 1: run dufomap
+    fire.Fire(run_dufo)
+    # step 2: run cluster on dufolabel
+    fire.Fire(run_cluster)
+
+    print(f"\nTime used: {(time.time() - start_time)/60:.2f} mins")