Skip to content

Commit 88f49bc

Browse files
authored
Merge ICP-Flow into codebase (#15)
* cuda(histlib): cuda library from icp-flow project. * docs: update README with icp-flow in the official implementation. * conf(optimization-based): update all config files. * todo: update model file, double check with yancong and Qingwen confirm that: icp-flow results can be reproduced and tested. * docs: fix small typo, and starting updating model file. * feat(icp): core icp files, tested successfully. * hotfix(runner): Refactor metrics accumulation to use speed buckets * Improve ICPFlow import error message * docs: move to reproduce As the official review didn't finish, will open another request once Yancong have time to review/improve. * fix(runner): val mode add valid to align waymo validation run. * update master port also for multi-program run. * revert: check the range_bucket is range of speed/distance. revert to previous fix. successfully run on all opt method.
1 parent c5f5746 commit 88f49bc

16 files changed

Lines changed: 1384 additions & 12 deletions

File tree

README.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,21 +41,22 @@ International Conference on Robotics and Automation (**ICRA**) 2025
4141
European Conference on Computer Vision (**ECCV**) 2024
4242
[ Strategy ] [ Self-Supervised ] - [ [arXiv](https://arxiv.org/abs/2407.01702) ] [ [Project](https://github.com/KTH-RPL/SeFlow) ] → [here](#seflow)
4343

44+
4445
- **DeFlow: Decoder of Scene Flow Network in Autonomous Driving**
4546
*Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt*
4647
International Conference on Robotics and Automation (**ICRA**) 2024
4748
[ Backbone ] [ Supervised ] - [ [arXiv](https://arxiv.org/abs/2401.16122) ] [ [Project](https://github.com/KTH-RPL/DeFlow) ] → [here](#deflow)
4849

4950
🎁 <b>One repository, All methods!</b>
50-
Additionally, *OpenSceneFlow* integrates following excellent works: [ICLR'24 ZeroFlow](https://arxiv.org/abs/2305.10424), [ICCV'23 FastNSF](https://arxiv.org/abs/2304.09121), [RA-L'21 FastFlow3D](https://arxiv.org/abs/2103.01306), [NeurIPS'21 NSFP](https://arxiv.org/abs/2111.01253). (More on the way...)
51+
Additionally, *OpenSceneFlow* integrates following excellent works: [ICLR'24 ZeroFlow](https://arxiv.org/abs/2305.10424), [CVPR'24 ICP-Flow](https://arxiv.org/abs/2402.17351), [ICCV'23 FastNSF](https://arxiv.org/abs/2304.09121), [RA-L'21 FastFlow3D](https://arxiv.org/abs/2103.01306), [NeurIPS'21 NSFP](https://arxiv.org/abs/2111.01253). (More on the way...)
5152

5253
<details> <summary> Summary of them:</summary>
5354

5455
- [x] [FastFlow3D](https://arxiv.org/abs/2103.01306): RA-L 2021, a basic backbone model.
5556
- [x] [ZeroFlow](https://arxiv.org/abs/2305.10424): ICLR 2024, their pre-trained weight can covert into our format easily through [the script](tools/zerof2ours.py).
5657
- [x] [NSFP](https://arxiv.org/abs/2111.01253): NeurIPS 2021, faster 3x than original version because of [our CUDA speed up](assets/cuda/README.md), same (slightly better) performance.
57-
- [x] [FastNSF](https://arxiv.org/abs/2304.09121): ICCV 2023. SSL optimization-based.
58-
- [ ] [ICP-Flow](https://arxiv.org/abs/2402.17351): CVPR 2024. SSL optimization-based. Done coding, public after review.
58+
- [x] [FastNSF](https://arxiv.org/abs/2304.09121): ICCV 2023. SSL Optimization-based.
59+
- [x] [ICP-Flow](https://arxiv.org/abs/2402.17351): CVPR 2024. SSL Optimization-based.
5960
- [ ] [EulerFlow](https://arxiv.org/abs/2410.02031): ICLR 2025. SSL optimization-based. In my plan, haven't coding yet.
6061

6162
</details>
@@ -145,7 +146,7 @@ wget https://huggingface.co/kin-zhang/OpenSceneFlow/resolve/main/flow4d_best.ckp
145146

146147
#### SSF
147148

148-
Extra pakcges needed for SSF model:
149+
Extra packages needed for SSF model:
149150
```bash
150151
pip install mmengine-lite
151152
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
@@ -223,6 +224,20 @@ python train.py model=deflowpp save_top_model=3 val_every=3 voxel_size="[0.2, 0.
223224
wget https://huggingface.co/kin-zhang/OpenSceneFlow/resolve/main/seflowpp_best.ckpt
224225
```
225226

227+
228+
### Optimization-based Unsupervised Methods
229+
230+
For all optimization-based methods, you can directly run `eval.py`/`save.py` to get the result without training, while the running might take really long time, maybe tmux for run it. For multi-program running, the master port can be set through `+master_port=12346`.
231+
232+
```bash
233+
# you can change another model by passing model name.
234+
python eval.py model=fastnsf
235+
236+
# or save the result directly
237+
python save.py model=fastnsf
238+
```
239+
240+
226241
## 3. Evaluation
227242

228243
You can view Wandb dashboard for the training and evaluation results or upload result to online leaderboard.

assets/cuda/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ Faster our code in CUDA.
55

66
- chamfer3D: 3D chamfer distance within two point cloud, by Qingwen Zhang involved when she was working on SeFlow.
77
- mmcv: directly from mmcv, not our code.
8+
- mmdet: only python file, no need to compile
9+
- histlib: from Yancong's [ICP-Flow](https://github.com/yanconglin/ICP-Flow) project.
810

911
---
1012

assets/cuda/histlib/__init__.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
from torch import nn
2+
from torch.autograd import Function
3+
import torch
4+
import importlib
5+
6+
import os, time
7+
import hist
8+
9+
def histf(X, Y, min_x, min_y, min_z, max_x, max_y, max_z, len_x, len_y, len_z, mini_batch=8):
10+
# print('hist cuda params: ', X.shape, Y.shape,
11+
# min_x, min_y, min_z,
12+
# max_x, max_y, max_z,
13+
# len_x, len_y, len_z,
14+
# )
15+
histogram = hist.hist(X.contiguous(), Y.contiguous(),
16+
min_x, min_y, min_z,
17+
max_x, max_y, max_z,
18+
len_x, len_y, len_z,
19+
mini_batch
20+
)
21+
return histogram
22+
23+
24+
torch.manual_seed(2022)
25+
26+
########################
27+
def run_test():
28+
pts = torch.randn(3, 1000, 3)
29+
indicators = torch.randint(0, 2, size=(3, 1000, 1))
30+
pts1 = torch.cat([pts, indicators], dim=-1)
31+
pts2 = pts1.clone()
32+
pts2[:, :,0] += 5.
33+
pts2[:, :,1] += -3.
34+
pts2[:, :,2] += -0.2
35+
36+
range_x = 10.
37+
range_y = 10.
38+
range_z = 0.5
39+
thres =0.1
40+
# bins_x = torch.linspace(-range_x, range_x, int(2*range_x/thres)+1)
41+
# bins_y = torch.linspace(-range_y, range_y, int(2*range_y/thres)+1)
42+
# bins_z = torch.linspace(-range_z, range_z, int(2*range_z/thres)+1)
43+
bins_x = torch.arange(-range_x, range_x+thres, thres)
44+
bins_y = torch.arange(-range_y, range_y+thres, thres)
45+
bins_z = torch.arange(-range_z, range_z+thres, thres)
46+
print('bins_x: ', bins_x)
47+
print('bins_z: ', bins_z)
48+
pts1 = pts1.cuda()
49+
pts2 = pts2.cuda()
50+
bins_x = bins_x.cuda()
51+
bins_y = bins_y.cuda()
52+
bins_z = bins_z.cuda()
53+
54+
t_hists = histf(pts1, pts2,
55+
-range_x, -range_y, -range_z,
56+
range_x, range_y, range_z,
57+
len(bins_x), len(bins_y), len(bins_z),
58+
)
59+
print('output shape: ', t_hists.shape)
60+
b, h, w, d = t_hists.shape
61+
for t_hist in t_hists:
62+
t_argmax = torch.argmax(t_hist)
63+
print(f't_argmax: {t_argmax}, {t_hist.max()} {h}, {w}, {d}, {t_argmax//d//w%h}, {t_argmax//d%w}, {t_argmax%d}')
64+
print('t_argmax', t_argmax//d//w%h, t_argmax//d%w, t_argmax%d, bins_x[t_argmax//d//w%h], bins_y[t_argmax//d%w], bins_z[t_argmax%d])
65+
66+
if __name__ == '__main__':
67+
68+
print("Pytorch version: ", torch.__version__)
69+
print("GPU version: ", torch.cuda.get_device_name())
70+
71+
run_test()

assets/cuda/histlib/hist.cu

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
#include <vector>
2+
#include "hist_cuda_core.cuh"
3+
4+
#include <ATen/ATen.h>
5+
#include <ATen/cuda/CUDAContext.h>
6+
#include <cuda.h>
7+
#include <cuda_runtime.h>
8+
9+
// #include <THC/THC.h>
10+
// #include <THC/THCAtomics.cuh>
11+
// #include <THC/THCDeviceUtils.cuh>
12+
13+
// extern THCState *state;
14+
15+
// author: Charles Shang
16+
// https://github.com/torch/cunn/blob/master/lib/THCUNN/generic/SpatialConvolutionMM.cu
17+
18+
19+
at::Tensor
20+
hist_cuda(const at::Tensor &X, const at::Tensor &Y,
21+
const float min_x, const float min_y, const float min_z,
22+
const float max_x, const float max_y, const float max_z,
23+
const int len_x, const int len_y, const int len_z,
24+
const int mini_batch
25+
)
26+
{
27+
// THCAssertSameGPU(THCudaTensor_checkGPU(state, 5, input, weight, bias, offset, mask));
28+
29+
AT_ASSERTM(X.is_contiguous(), "input tensor has to be contiguous");
30+
AT_ASSERTM(Y.is_contiguous(), "input tensor has to be contiguous");
31+
32+
AT_ASSERTM(X.type().is_cuda(), "input must be a CUDA tensor");
33+
AT_ASSERTM(Y.type().is_cuda(), "input must be a CUDA tensor");
34+
35+
const int batch = X.size(0);
36+
const int num_X = X.size(1);
37+
const int dim = X.size(2);
38+
const int num_Y = Y.size(1);
39+
40+
AT_ASSERTM((X.size(0) == Y.size(0)), "batch_X (%d) != batch_Y (%d).", X.size(0), Y.size(0));
41+
AT_ASSERTM((X.size(2) == Y.size(2)), "dim_X (%d) != dim_Y (%d).", X.size(2), Y.size(2));
42+
43+
AT_ASSERTM((dim == 4), "dim (%d) != 4; 3 for (x, y, z); 1 for indicator,padded or not.", dim);
44+
45+
// printf("len: %d %d %f \n", len_x, len_y, len_z);
46+
// printf("hist cuda coord: %f, %f, %f; %f, %f, %f; %f, %f, %f. \n", val_x, val_y, val_z, p_x, p_y, p_z, len_x, len_y, len_z);
47+
48+
// auto bins = at::zeros({batch, len_x, len_y, len_z}, X.options());
49+
// AT_DISPATCH_FLOATING_TYPES(X.type(), "hist_cuda_core", ([&] {
50+
// hist_cuda_core(at::cuda::getCurrentCUDAStream(),
51+
// X.data<scalar_t>(), Y.data<scalar_t>(),
52+
// batch, dim, num_X, num_Y,
53+
// min_x, min_y, min_z,
54+
// max_x, max_y, max_z,
55+
// len_x, len_y, len_z,
56+
// bins.data<scalar_t>());
57+
// }));
58+
59+
auto bins = at::zeros({batch, len_x, len_y, len_z}, X.options());
60+
61+
int iters = batch / mini_batch;
62+
if (batch % mini_batch != 0)
63+
{
64+
iters += 1;
65+
}
66+
67+
for (int i=0; i<iters; ++i)
68+
{
69+
int mini_batch_ = mini_batch;
70+
if ((i+1) * mini_batch > batch)
71+
{
72+
mini_batch_ = batch - i * mini_batch;
73+
}
74+
// printf("iter: %d %d %d %d %d \n", i, iters, mini_batch_, mini_batch, batch);
75+
AT_DISPATCH_FLOATING_TYPES(X.type(), "hist_cuda_core", ([&] {
76+
hist_cuda_core(at::cuda::getCurrentCUDAStream(),
77+
X.data<scalar_t>() + i*mini_batch*num_X*dim,
78+
Y.data<scalar_t>() + i*mini_batch*num_Y*dim,
79+
mini_batch_, dim, num_X, num_Y,
80+
min_x, min_y, min_z,
81+
max_x, max_y, max_z,
82+
len_x, len_y, len_z,
83+
bins.data<scalar_t>()+i*mini_batch*len_x*len_y*len_z);
84+
}));
85+
}
86+
87+
88+
89+
return bins;
90+
}

assets/cuda/histlib/hist.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#pragma once
2+
#include <torch/extension.h>
3+
4+
at::Tensor
5+
hist(const at::Tensor &X, const at::Tensor &Y,
6+
const float min_x, const float min_y, const float min_z,
7+
const float max_x, const float max_y, const float max_z,
8+
const int len_x, const int len_y, const int len_z,
9+
const int mini_batch
10+
);
11+
12+
13+

assets/cuda/histlib/hist_cuda.cpp

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#include "hist.h"
2+
#include "hist_cuda.h"
3+
4+
at::Tensor
5+
hist(const at::Tensor &X, const at::Tensor &Y,
6+
const float min_x, const float min_y, const float min_z,
7+
const float max_x, const float max_y, const float max_z,
8+
const int len_x, const int len_y, const int len_z,
9+
const int mini_batch
10+
)
11+
{
12+
13+
if (X.type().is_cuda() && Y.type().is_cuda())
14+
{
15+
return hist_cuda(X, Y,
16+
min_x, min_y, min_z,
17+
max_x, max_y, max_z,
18+
len_x, len_y, len_z,
19+
mini_batch
20+
);
21+
}
22+
AT_ERROR("Not implemented on the CPU");
23+
}
24+
25+
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
26+
m.def("hist", &hist, "hist");
27+
}

assets/cuda/histlib/hist_cuda.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#pragma once
2+
#include <torch/extension.h>
3+
4+
at::Tensor
5+
hist_cuda(const at::Tensor &X, const at::Tensor &Y,
6+
const float min_x, const float min_y, const float min_z,
7+
const float max_x, const float max_y, const float max_z,
8+
const int len_x, const int len_y, const int len_z,
9+
const int mini_batch
10+
);
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
#include <cstdio>
2+
#include <algorithm>
3+
#include <cstring>
4+
5+
#include <ATen/ATen.h>
6+
#include <ATen/cuda/CUDAContext.h>
7+
8+
// #include <THC/THC.h>
9+
#include <THC/THCAtomics.cuh>
10+
// #include <THC/THCDeviceUtils.cuh>
11+
12+
#define CUDA_KERNEL_LOOP(i, n) \
13+
for (int i = blockIdx.x * blockDim.x + threadIdx.x; \
14+
i < (n); \
15+
i += blockDim.x * gridDim.x)
16+
17+
const int CUDA_NUM_THREADS = 1024;
18+
inline int GET_BLOCKS(const int N)
19+
{
20+
return (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS;
21+
}
22+
23+
template <typename scalar_t>
24+
__global__ void hist_cuda_kernel(const int n,
25+
const scalar_t* X,
26+
const scalar_t* Y,
27+
const int batch, const int dim,
28+
const int num_X, const int num_Y,
29+
const float min_x, const float min_y, const float min_z,
30+
const float max_x, const float max_y, const float max_z,
31+
const int len_x, const int len_y, const int len_z,
32+
scalar_t* bins
33+
)
34+
{
35+
CUDA_KERNEL_LOOP(index, n)
36+
{
37+
// index index of output matrix
38+
// launch in parallel: batch * numX * numY;
39+
// printf("hist cuda bin size: %d, %d, %d, %d. \n", batch, len_x, len_y, len_z);
40+
const int b = index / num_X / num_Y % batch;
41+
const int i = index / num_Y % num_X;
42+
const int j = index % num_Y;
43+
44+
scalar_t flag_x = X[b*num_X*dim+i*dim+3];
45+
scalar_t flag_y = Y[b*num_Y*dim+j*dim+3];
46+
if (flag_x>0.0 && flag_y>0.0)
47+
{
48+
scalar_t val_x = X[b*num_X*dim+i*dim+0] - Y[b*num_Y*dim+j*dim+0];
49+
scalar_t val_y = X[b*num_X*dim+i*dim+1] - Y[b*num_Y*dim+j*dim+1];
50+
scalar_t val_z = X[b*num_X*dim+i*dim+2] - Y[b*num_Y*dim+j*dim+2];
51+
if (val_x >= min_x && val_x < max_x && val_y >= min_y && val_y < max_y && val_z >= min_z && val_z < max_z)
52+
{
53+
// [): left included; right excluded.
54+
int p_x = __float2int_rd( (val_x-min_x) / (max_x-min_x) * __int2float_rd(len_x));
55+
int p_y = __float2int_rd( (val_y-min_y) / (max_y-min_y) * __int2float_rd(len_y));
56+
int p_z = __float2int_rd( (val_z-min_z) / (max_z-min_z) * __int2float_rd(len_z));
57+
58+
// printf("hist cuda coord: %d, %d, %d, %d; %d, %d, %d, %d. \n", batch, len_x, len_y, len_z, b, p_x, p_y, p_z);
59+
int bin_id = b*len_x*len_y*len_z + p_x*len_y*len_z + p_y*len_z + p_z;
60+
atomicAdd(bins + bin_id, 1);
61+
}
62+
}
63+
}
64+
}
65+
66+
template <typename scalar_t>
67+
void hist_cuda_core(cudaStream_t stream,
68+
const scalar_t* X, const scalar_t* Y,
69+
const int batch, const int dim,
70+
const int num_X, const int num_Y,
71+
const float min_x, const float min_y, const float min_z,
72+
const float max_x, const float max_y, const float max_z,
73+
const int len_x, const int len_y, const int len_z,
74+
scalar_t* bins
75+
)
76+
{
77+
const int num_kernels = batch * num_X * num_Y;
78+
// printf("num kernels: %d\n", num_kernels);
79+
80+
// printf("hist cuda core: %f, %f, %f; %f, %f, %f; %f, %f, %f. \n", min_x, min_y, min_z, max_x, max_y, max_z, len_x, len_y, len_z);
81+
// printf("hist cuda core: ", min_x, min_y, min_z, max_x, max_y, max_z, len_x, len_y, len_z, " \n");
82+
hist_cuda_kernel<scalar_t>
83+
<<<GET_BLOCKS(num_kernels), CUDA_NUM_THREADS, 0, stream>>>(
84+
num_kernels,
85+
X, Y,
86+
batch, dim,
87+
num_X, num_Y,
88+
min_x, min_y, min_z,
89+
max_x, max_y, max_z,
90+
len_x, len_y, len_z,
91+
bins
92+
);
93+
94+
cudaError_t err = cudaGetLastError();
95+
if (err != cudaSuccess)
96+
{
97+
printf("error in hist_cuda_core: %s\n", cudaGetErrorString(err));
98+
}
99+
}
100+

assets/cuda/histlib/setup.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from setuptools import setup
2+
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
3+
4+
setup(
5+
name='hist',
6+
ext_modules=[
7+
CUDAExtension('hist', [
8+
"/".join(__file__.split('/')[:-1] + ['hist_cuda.cpp']), # must named as xxx_cuda.cpp
9+
"/".join(__file__.split('/')[:-1] + ['hist.cu']),
10+
]),
11+
],
12+
cmdclass={
13+
'build_ext': BuildExtension
14+
},
15+
version='1.0.1')

0 commit comments

Comments
 (0)