-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Options to do SLAM for video? Get poses and camera intrinsics? #8
Comments
Although the task is a little different, I converted the dust3r result into a NeRF transform matrix and used it to train NeRF. focals = scene.get_focals().cpu()
extrinsics= scene.get_im_poses().cpu()
intrinsics = scene.get_intrinsics().cpu()
def make_nerf_transform_matrix_json(focals, intrinsics, extrinsics, scale=1):
js_file = OrderedDict()
focals = (to_numpy(focals) * scale).tolist()
scaled_width = int(intrinsics[0, :, 2][0].item() * scale * 2)
scaled_height = int(intrinsics[0, :, 2][1].item() * scale * 2)
aspect_ratio = scaled_width / scaled_height
# rotate the extrinsic matrix to the NeRF space
frames = []
for i, ext in enumerate(to_numpy(extrinsics)):
ext[:3, 3:] *= scale
temp_dict = {
'file_path': 'images/image_%03d.png'%(i),
'transform_matrix': (ext @ OPENGL).tolist(), # OPENGL = np.array([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])
'fl_x': focals[i][0],
'fl_y': focals[i][0],
}
frames.append(temp_dict)
js_file['w'] = scaled_width
js_file['h'] = scaled_height
js_file['cx'] = scaled_width / 2.0
js_file['cy'] = scaled_height / 2.0
js_file['frames'] = frames
# dust3r does not know the distortion params
js_file['k1'] = 0.0
js_file['k2'] = 0.0
js_file['p1'] = 0.0
js_file['p2'] = 0.0
with open('./transforms.json', 'w', encoding='utf-8') as file:
json.dump(js_file, file, ensure_ascii=False, indent='\t') Here, the 'scale' variable is calculated in 'load_image' function in dust3r.utils.image. Unfortunately, NeRF results with those converted camera poses are terrible. |
Hi @alik-git Dust3r is efficient for reconstruction for some scenes, it wouldn't be ideal for slam based task, unless it is coupled with some classical technique or after performing some camera calibraition and fintetuning on your own (data and cam model) maybe. |
Hi, Dust3r is a wonderful work and I want to get poses between two images from the real world. So I wonder the datas from "poses = scene.get_im_poses()" is relative pose estimate or Absolute Pose Estimation?Thank U. The datas is as follows: poses = tensor([[[ 9.9999e-01, -3.6112e-03, 3.9793e-03, -3.2458e-05],
|
Thanks a lot Hello everyone, I'll start by answering @alik-git 's questions: If you started with images of shape 680x1200 they get reshaped to 290x512, that is a factor of 2.34375
which looks pretty close to what you gave as GT =) |
@hwanhuh I gave a quick look at your code and it seems your a mistakenly considering that dust3r/dust3r/cloud_opt/optimizer.py Line 134 in 517a430
You should thus invert them to get camera projection matrices |
@MisterEkole it is true that the implicit camera model of DUSt3R is an over-parametrization of standard camera models, that enforce less constraints. However, we train only on data captured with "pinhole" camera models, and we show in additional The experiments are as follow: we compute average absolute error of field-of-view estimates (in degrees) and the average 2D reprojection accuracy (in %) at a threshold of 1% of the image diagonal, both measured on raw predictions for 1000 randomly sampled (unseen) test images from the Habitat, BlendedMVS and CO3D datasets. Note that Habitat and BlendedMVS are synthetically generated, thus the intrinsics are perfectly known. For CO3D, we consider approximate focals estimated via COLMAP. |
@Leonard-Yao The world2cam matrices are expressed in a common reference frame although the term "absolute" might be misleading: the poses are not expressed in a metric space, because the reconstruction is performed up to an unknown scale factor. |
@vincent-leroy Thank U for you patient reply. After get depths of every pixels in paired pictures, whether I can figure out the unknown scale factor according to the returned 3D points? such as factor = real depth / returned depth. |
Thanks for answering my question @vincent-leroy! And just a follow-up question, what do you think about getting poses? Basically, do you think this can be extended into a full-fledged slam system? Or would that be a lot of work. |
I think it should not be too hard to make a functional a slam system with it but some points need to be figured out. We might look into that in the future, but this is not the top priority |
Hello, I would like to ask, if the pose estimated by dust3r is put into nerf, will there be ghost blur in the nerf rendering? |
Not sure why this is closed. @alik-git can you reopen? It's a fruitful discussion. |
Sure, no problem. Reopening. |
@
I'm a nerf researcher and I'm very interested in applying dust3 with nerf modeling. |
This looks really interesting. |
Is it possible to get the focals and intrinsic value for in-the-wild samples? Assuming the values to be almost similar to the real value and not exact |
For the SLAM system, I wonder how to maintain the scale consistently, especially when the scene changes (like from outdoor to indoor) |
The latest update of spann3r now supports camera parameter estimation |
and now there is MASt3R-SLAM, in case someone comes upon this issue in the future. |
Hi there, congrats on the fantastic work! These are amazing results.
I'm working on 3D mapping systems for robotics, and was wondering if
Given a video, can this method help with obtaining the camera parameters, and poses for each frame?
Do you guys have any scripts already for this? I see that in the example usage you have:
And you can do
scene.get_intrinsics()
which is great, but when I run this on 12 images from the replica dataset,scene.get_intrinsics()
outputs 12 different intrinsic matrices, none of which really match the original camera intrinsics of the replica dataset.Am I doing something wrong? Should I specify the scale or resolution or something else about the images at some point? The replica images are 1200x600 (w,h) but they get resized to 512 I'm assuming.
Just wondering how I should go about getting the camera parameters for a monocular rgb video, or if that's not really possible to do super accurately yet with this method.
For extra detail, I'm using the following frames from the replica dataset
and the output of the
scene.get_intrinsics()
is as follows, I'm only showing two of the matrices here, not all 12:compared to the ground truth camera params of the replica dataset from the
camera_params.json
here is the actual
camera_params.json
file in case it helpsAlso, just curious, how would I go about running this on long videos? Or is that not possible yet?
My apologies if these are too many questions! This method is really awesome, and I'm having a lot of fun using it. Thanks again for the wonderful work!
The text was updated successfully, but these errors were encountered: