Visual Servoing Platform version 3.7.0
Loading...
Searching...
No Matches
Tutorial: Tracking with the Render-Based Tracker

1. Introduction

In ViSP 3.7, we are introducing a new Render-Based Tracker (RBT), that leverages rendering on the GPU to extract geometric information in order to perform online and continuous 6D pose estimation of a known 3D object.

It is an extension of the Model-Based Tracker (MBT) already present in ViSP (see Tutorial: Markerless generic model-based tracking using a color camera). This implementation is derived from the work of [45].

A major advantage over the Model-Based Tracker is that the RBT uses a common mesh representation for the object to track. This representation does not require additional preprocessing for feature selection and is widely supported in software such as Blender. It can also be obtained through photogrammetry processes, minimizing the work required before using the tracker. Note that we do not consider deformable objects, only non-articulated rigid objects.

Similar to the MBT, tracking is framed as an optimization problem, where the difference between the features (geometric or photometric) extracted from a 3D model and those observed in the image must be minimized.

The RBT is built as a modular pipeline, see 3. Algorithm overview. In practice, this means:

  • It can leverage both RGB and depth information (see 2. Requirements)
  • It features a complete configuration for every part of the tracking pipeline (see 4. Configuration)
  • Most components can be disabled or adapted to different use cases and scenarios
  • Components can be extended: You can add new features or filtering methods (see 7. Extending the RBT).

2. Requirements

2.1. Building the RBT module

To successfully build the RBT module, you will need:

  • To compile using a minimum standard version of C++11 (defined in the USE_CXX_STANDARD CMake variable)
  • The Panda3D 3rd-party, which is used for 3D rendering. See 2. Panda3D installation for more information on how to install Panda3D on your system. Panda3D is also available through Conda if you are compiling in a virtual environment.

Additionally, Some optional dependencies are (strongly) recommended:

  • nlohmann::json (see 4.1.12. Other optional 3rd parties for installation instructions) to load configuration files and save your tracking results. Without it, the tracker setup will have to be done through code
  • OpenCV, if you wish to use the vpRBKltTracker, that uses KLT feature tracking for pose estimation.

2.2. General requirements

To use the RBT you will need several things:

  • An OpenGL or DirectX enabled device, capable of performing 3D rendering. You can use software acceleration, although having a GPU is preferable for performance reasons
  • A camera with known intrinsic parameters. For calibration see Tutorial: Camera intrinsic calibration.
    Warning
    The camera intrinsics should follow a model without distortion.
  • If you are using an RGB-D camera, the depth image should be aligned with the RGB image. Some SDKs provide this functionality. For instance, the wrapper around the realsense SDK accepts an "align" parameter in the vpRealSense2::acquire() function. When correctly set, the alignment will automatically be performed for you.
  • A 3D model of the object, in a format that is supported by Panda3D (see 3. Converting a mesh to Panda's file format). If you have installed libassimp-dev or are using the conda package, Panda3D supports common formats such as .ply*, .obj. Otherwise, you will have to convert your mesh using the previously linked method.

2.3. 3D model considerations

There are very little restrictions on the 3D model that can be used in the RBT.

First, your 3D model's size should be expressed in meters. Be aware that some CAD software export models in millimeters.

If you are using the initialization by click, you will have to be careful with the model orientation when exporting (see below).

Note that while 3D meshes support textures, the presently available RBT features do not use the texture information.

To correctly process your model, here is a small overview of the steps to follow in Blender

2.3.1. Preparing your model in Blender

To make sure your model is correct, you should start by setting the scale and orientation of your 3D model.

You should first import your model in Blender. In the top left corner of the Blender window, click on File > Import > (Your model type)**, then pick your object in the popup window.

Your model should have been imported and should be visible in the 3D viewer. You should then click on it and press N** to bring up the Transform panel in the top right corner. Then you should press Ctrl+A to open the Apply menu and click on the All transforms item. When you do this, you will not see any change applied to the model itself, but this will ensure that what you see in Blender is what is observed by the tracker and that both use the same orientation and scale.

Apply transforms to clear any rotation and scale difference stored in Blender.

If you are using initialization by click as described in 4.1. Initialization by user click section, you can select the 3D points to click directly in Blender. To do so, first click on your object and press tab to go into Edit mode. Then, press N to display the Transform Panel (top right of the viewer). Choose a 3D point to use for initialization, then click on it. In the Transform panel, you should then see the XYZ coordinates of the point, which you can copy into the init file. Be sure to select the "Global" frame in the transform panel.

The transform panel with a selected 3D point in Edit mode.

Once you have selected your points make sure to export your model in the same frame as the Blender frame. You can export your 3D model, using the File > Export > (Supported 3D format such as .obj) and save your model in the desired location. In the export panel, set "Forward Axis" to -Z and "Up Axis" to Y.

With all the prerequisites met, you can now understand (if you wish) how the tracker works and how to customize it for your needs.

3. Algorithm overview

This section of the tutorial details how the tracker works.

The RBT is used by building a vpRBTracker object, configuring it and then feeding it RGB or RGB-D frames.

As the RBT is a tracking algorithm, an initial pose must be provided. Given an initial pose, the RBT will continuously track an object in a sequence of frames. This also assumes that the motion between consecutive frame is small enough so that the tracker can retrieve the features and correctly update the pose of the object in the camera frame. Otherwise, there is a risk of divergence, leading to a need for reinitialization. This can be done via click (see 4.1. Initialization by user click section and the vpRBTracker::initClick() method). Otherwise, you can use a pose estimation algorithm (e.g. Megapose described in Tutorial: Tracking with MegaPose and vpMegaPose::estimatePoses() method) followed by a call to vpRBTracker::setPose().

Roughly the algorithm works as follows:

  1. Given an initial pose of the object in the camera frame $^{c}\mathbf{T}_{o}$, generate a render of the object at this pose.
  2. Extract and process render data, and store it into a vpRBRenderData object.
  3. From the renders, seek features to match with in the current RGB and depth images.
  4. From the match between image and model features, formulate an error $\mathbf{e}$.
  5. Iteratively minimize $\vert\mathbf{e}\vert_2$ by updating $^{c}\mathbf{T}_{o}$

3.1. Rendering

Object rendering is done through Panda3D. The object is rendered at the last computed pose.

All the information derived from the render is stored into a vpRBRenderData, in the current frame data (vpRBFeatureTrackerInput). This object contains information such as:

  • The 3D surface normals, expressed as 3D unit vectors
  • The depth map, in meters (Rendered using vpPanda3DGeometryRenderer)
  • The object silhouette (extracted using vpPanda3DDepthCannyFilter and vpRBTracker::extractSilhouettePoints)
  • The image area containing the object (represented by a vpRect)
  • The pose at which the object was rendered: When implementing new features or algorithms, it is preferable to use this pose when extracting data from the render, instead of those provided in the functions to redefine.
Rendering is used to extract geometric information

For more information on how rendering is performed you can examine the following sources:

3.2. Trackable features

This section details the different features that can be used to track the object. The RBT is flexible, and each feature can be added or removed, depending on your scenario and available sensor.

For each feature, a weighting function is computed at every iteration, and a user specified weight can be used to consider certain features as more or less important than others.

All features inherit from a base class, vpRBFeatureTracker, that defines the different functionalities that should be implemented for a feature to be trackable.

To add or interact with the features of the RBT, you can use vpRBTracker::addTracker() and vpRBTracker::getFeatureTrackers().

3.2.1. Feature list

Feature Class Required inputs Note
KLT 2D points vpRBKltTracker Grayscale image Requires OpenCV
Silhouette moving edges vpRBSilhouetteMeTracker Grayscale image -
Silhouette color edges vpRBSilhouetteCCDTracker Color image More expensive than silhouette moving edge based tracker
Depth map (point-to-plane) vpRBDenseDepthTracker Depth image -
XFeat keypoints RBXFeatFeatureTracker Color image Only available in Python
  • KLT points Rely on the vpKltOpencv class to extract and track 2D KLT points from the current image, which are then associated with points on the 3D model. The error to minimize is then the reprojection error between the 2D points in the current image and the reprojection of their associated 3D points.
  • Silhouette moving edges Rely on the rendering to first extract the object depth disparities, which are hypothesized to create disparities in the luminance image as well. For each point of the depth contour, the most likely edge in its neighborhood in the luminance map is detected using the moving edge framework.
  • Silhouette Color edges Similarly to the silhouette moving edges, the depth disparity is first used to extract the outer silhouette contours. Then, based on the Contracting Curve Density algorithm, color statistics are computed inside and outside the silhouette and a per pixel error is computed. This error is minimized when the difference between the inner and outer color distributions are maximized (i.e, there is a clear difference between the object's color and the environment). Unlike the moving edge approach, the error is not geometric in nature, but rather photometric.
  • Depth The depth information can be used and compared to the rendered depth. With this feature, the error to minimize is the point-to-plane distance. The point is sampled from the current depth map, while the plane is computed from the rendered 3D model, using its distance to the camera and surface normal. The plane is continuously updated to minimize the distance to the sample depth point.
  • XFeat keypoints Robust, deep learning based keypoints. The error criterion is the same as the KLT points, but unlike KLT, XFeat relies on descriptors that are stored in a map in association to 3D points These points can be matched even in cases where there are large motions and across time.

To define you own features (advanced), see 7.1. Defining your own features.

While you can use all the different features together here are some common rules of thumb

  • KLT features tend to work better when there is some texture information on the model itself. However, they are also worth a try even on models with little texture.
  • Avoid using the KLT tracker by itself. Indeed, the 3d points associated to tracked 2D points are estimated from the model at a given timestep. If this model is even slightly incorrect, this will result in drift over time. This drift can be compensated by the other features that do not have this 2D-3D correspondence step.
  • Similarly, the depth features are ill-suited in two scenarios. First, when tracking an object where depth estimation is complex. This can be the case when using an infrared based depth camera and tracking a black object that reflects very little light. The second scenario is when tracking movements that are perpendicular to the camera plane with very low depth variation. An example of this is a cube that is fully facing the camera and that is translated left to right. In this case, depth information should be combined with other features to track.
  • Contour-based features are important for correct pose estimation. Color edges tend to recover from larger motions than the moving edge-based features, but may result in more jitter. In most cases, it is not interesting to combine the two. One case that can be of interest is to use the color silhouette for large motion, and use more point for moving edges to recover the finer motion. See the configuration section 4.2.4. Silhouette extraction settings and 4.2.7.3. vpRBSilhouetteCCDTracker for information on how to accomplish this.
  • Note that the color silhouette features only rely on the outer object silhouette, while the moving edges can also capture and leverage the depth disparities inside the object. If you have strong depth changes on your object, it can be interesting to use the moving edges with a correct depth threshold.

3.3. Feature filtering

When tracking, it is possible for occlusions or lighting artifacts to appear. In those cases, considering ill-defined or ill-matched features may lead to tracking failure. To handle those cases and improve tracking reliability, the RBT comes with two mechanisms:

  • A masking/segmentation approach, that outputs a per-pixel probability map that a given pixel belongs to the object
  • Robust error criteria. Each feature defined above uses an M-Estimator (through vpRobust) to reweigh the errors to be minimized and reduce the influence of outliers.

While M-estimators are always used, the masking approach is entirely optional. To enable masking, you should call vpRBTracker::setObjectSegmentationMethod() with the method that you wish to use. To disable it, pass it nullptr as input argument. As of now, the only method available is vpColorHistogramMask. This method computes time-smoothed histograms of the object and background's color distributions, and compares their output probabilities for a given pixel color to compute an object membership score. If available, this method may use camera depth information to update the object histograms using only reliable pixels in terms of depth (where object and render are closely aligned) This method is ideal when there is a clear photometric distinction between object and environment.

You may define your own approach to object segmentation (by, e.g., using a neural network to compare feature maps or using more depth information) by inheriting from vpObjectMask.

Note that the use of the probability map output by the segmentation method is implemented differently for each type of feature. For instance, contour based method will search for strong probability differences along the contour normal, while point based feature may only look in a small neighborhood to see if a given area belong to the object.

The diagram below sums up the combination of the masking and outlier rejection steps.

Masking is first used to remove features in cases such as strong occlusions. Remaining outliers are filtered through the use of robust estimators.

3.4. Minimization problem

To update the camera pose, the RBT implements tracking as a non-linear optimization problem. The error to be minimized is the difference between features extracted from the 3D model that have been matched with features in the current images. For each feature, we must compute the visual error as well as its relationship to the pose (the Jacobian).

The tracking flow is as follows:

  • For each tracked feature type
    • Extract features from the renders. Potentially use segmentation mask to reject unreliable data
    • Match/track features in the image acquired by the camera.

Then, optimization is an iterative process:

  • While optimization has not converged or maximum number of iterations has not been reached:
    • For each feature
    • Update feature representation (extracted from the model) with the new pose $^{c}\mathbf{M}_{o}$
    • Compute feature error
    • Compute feature Jacobian
    • Compute pose displacement, by performing a weighted combination of the feature errors and Jacobians
    • Update $^{c}\mathbf{M}_{o}$
    • Check for convergence.

To influence the importance of each feature type, the user can tweak its weight by calling vpRBFeatureTracker::setTrackerWeight().

There are several options to tweak the behavior of the optimization process, which is based on a Levenberg-Marquardt approach. See 4. Configuration for more information.

3.5. Detecting tracking failure

Tracking may of course fail. It is thus important to have some measure of confidence in the tracking process.

If this option is enabled, then the RBT will compute a confidence metric for the computed object pose.

It is then your responsibility to use this score to call for a tracker reinitialization.

To enable drift detection, you should call vpRBTracker::setDriftDetector() with the scoring method that you wish to use.

The method that is provided by default, vpRBProbabilistic3DDriftDetector, is an online method that compares the photometric information of the model in the current image with a representation that is learned online. If available, it will also compare the depth in the render data with the depth from the camera. This representation is a set of 3D points, for which some color statistics are stored and updated every frame.

This method will output a probability estimate that the model is correctly tracked. The confidence metric is thus between 0 and 1.

This method relies on framing color and depth information as Gaussian distributions. Their spread is controlled by the user and a larger spread will result in a greater confidence and smoother changes in the confidence score.

In addition, since color estimation is performed online, the variance of the color distribution of a point is used to factor to weigh its importance with respect to other sample when computing the final score. This means that more importance will be given to points that have a stable color (that are in a smooth color region of the model and that were not too impacted by lighting changes in the previous frames).

4. Configuration

To ease the use of the RBT, it is strongly recommended to have compiled ViSP with the 3rd-party that enables JSON support (see JSON for modern C++ installation instructions). This section describes how to configure the tracker.

4.1. In-code configuration

If you wish or must configure the tracker directly from the code, these are the main methods of interest:

4.2. JSON configuration

To load a json configuration file, you can call vpRBTracker::loadConfigurationFile().

This subsection details the parameters that you can tweak through JSON.

4.2.1. General parameters

Field Type Description
model string Optional path to the 3D model of the object. If not present, it should be set using vpRBTracker::setModelPath() before calling vpRBTracker::startTracking()
displaySilhouette Boolean Whether to display the object silhouette from the last rendered frame when calling vpRBTracker::display(). Note that since this is the last rendered frame, it may appear laggy.
updateRenderThreshold float Optional. The motion threshold between the last render and the current object pose above which the object should be rerendered and the renders updated. By default, this metric is specified in meters. This threshold is also used when rerendering after odometry. Set this value to 0 to always perform rendering.
camera Dictionary Optional camera intrinsics. See 4.2.2. Camera configuration.
vvs Dictionary Parameters for optimization. See 4.2.3. Optimization parameters.
silhouetteExtractionSettings Dictionary Parameters for the contour extraction from the renders. See 4.2.4. Silhouette extraction settings.
mask Dictionary Optional segmentation method configuration. See 4.2.5. Object segmentation method configuration.
drift Dictionary Optional drift detection method configuration. See 4.2.6. Drift detection method configuration.
features List List of features to track. See 4.2.7. Feature configuration.

4.2.2. Camera configuration

Field Type Description
width int Image width
height int Image Height
intrinsics Dictionary Camera intrinsics, as detailed below

Intrinsics definition:

Field Type Description
px double Focal length to pixel width ratio
py double Focal length to pixel height ratio
u0 double Principal point pixel vertical location
v0 double Principal point pixel horizontal location
model string Should be set to "perspectiveWithoutDistortion"

4.2.3. Optimization parameters

Example configuration without defining a stopping criterion for optimization:

"vvs": {
"gain": 1.0,
"maxIterations": 10,
"mu": 0.0,
"muIterFactor": 0.0
}

And with a criterion (stop if 3D motion is below 1/10th of a millimeter between two iterations)

"vvs": {
"gain": 1.0,
"maxIterations": 10,
"mu": 0.0,
"muIterFactor": 0.0,
"convergenceThreshold": 0.0001
}
Field Type Description
gain double Optimization gain. A gain too small may lead to drift issues, while a gain too high may lead to instabilities in pose.
maxIterations int Maximum number of optimization iterations.
mu double Initial amount of regularization for the Levenberg-Marquardt optimizer.
muIterFactor double Scaling factor to the mu parameter applied at every optimization iteration
convergenceThreshold double Motion threshold (in meters) below which the tracking is considered as having converged. This conditions the number of iterations for optimization. Set to 0 or do not define to run optimization for the maximum number of iterations.

4.2.4. Silhouette extraction settings

Example configuration:

"silhouetteExtractionSettings": {
"threshold": {
"type": "relative",
"value": 0.1
},
"sampling": {
"samplingRate": 1,
"numPoints": 512,
"reusePreviousPoints": true
}
}

To control the silhouette processing, there are two main settings threshold and sampling.

The threshold settings serve to determine what is a silhouette point or not. Silhouette extraction is based on depth disparity between neighboring render pixels. If there is a strong depth disparity, then a point is considered as belonging to the silhouette.

The threshold can be either defined as an absolute value, or relatively to the object's "depth" size (the difference between the farthest and closest object points in the camera).

Field Type Description
type string Either "relative" or "absolute". If relative, specified as a fraction of the difference between the clipping planes. If absolute, specified in meters.
value double Value of the threshold.

The thresholding parameters are used to determine whether a point belongs to the silhouette, but processing the whole silhouette may too cumbersome as there can thousands of contour pixels. Thus, we can sample the silhouette to obtain a subset of points. Sampling as a strong impact on the runtime performance of the tracker.

Field Type Description
samplingRate int This is the step (in pixels) that is used to sample the silhouette map. A value of 1 will examine every pixel. Higher values are susceptible to aliasing artifacts, but may provide more even sampling across the silhouette.
numPoints int Maximum number of points to keep. If set to 0, then all points are kept.
reusePreviousPoints boolean Whether to try and reuse the same silhouette points across tracking iterations. This may help improve stability and reduce jitter.

4.2.5. Object segmentation method configuration

The use of a segmentation method is defined by the mask key in the main configuration dictionary. Which type of method is used depends on the value of the "type" key of the mask dictionary.

4.2.5.1. vpColorHistogramMask configuration

Example configuration:

{
"type": "histogram",
"bins": 32,
"objectUpdateRate": 0.1,
"backgroundUpdateRate": 0.1,
"maxDepthError": 0.01,
"computeOnlyOnBoundingBox": false
}
Field Type Description
type string Set to "histogram" to use this method.
bins int Should be a power of 2 (e.g, 16 or 32). Determines the number of bins used per color component for the object and background histograms. Less bins will lead to less noise in the pixel map, but may lead to a less confident map.
objectUpdateRate float Object histogram update rate. Should be between 0 and 1. A value of 1 means that only the last frame will be used in the histogram. Values below will keep a memory of the last seen frames.
backgroundUpdateRate float Background histogram update rate. Same as objectUpdateRate.
maxDepthError float When depth information is available, Render depth and actual depth will be compared. For a given pixel of the object, if the depth difference is too high, the pixel will be used to compute the background histogram instead of the object histogram. This helps alleviate the influence of occlusion on the mask. Tihs parameter controls the max tolerated depth difference, expressed in meters.
computeOnlyOnBoundingBox boolean Whether to compute the mask only in the area containing the object. If true, this will save computation time, but may lead to preserving silhouette contours that should not be.

4.2.6. Drift detection method configuration

4.2.6.1. vpRBProbabilistic3DDriftDetector configuration

Example configuration:

{
"type": "probabilistic",
"colorUpdateRate": 0.25,
"initialColorSigma": 25.0,
"depthSigma": 0.025,
"filteringMaxDistance": 0.001,
"minDistanceNewPoints": 0.005
}
Field Type Description
type string Set to "probabilistic" to use this method.
colorUpdateRate float Update rate for the color statistics of each point on the object's surface.
initialColorSigma float Standard deviation (R,G,B) used to compute the error function between the observed color and the estimated color distribution.
depthSigma float Standard deviation used to compute the error function between the observed pixel depth and rendered depth.
filteringMaxDistance float Minimum error between the rendered depth and the stored point's distance to the camera. This is used to check whether a 3D point is visible from the camera. This value is in meters, and should be left to a small value.
minDistanceNewPoints float Each frame, this method tries to sample new 3D points on the object. This parameter controls the sampling density, and ensures that a new candidate point is far enough from all the other stored points from previous iterations.

4.2.7. Feature configuration

The type of feature that is used depends on the "type" value of each feature dictionary.

The "features" object of the main dictionary is a list, as multiple features can be used.

For each feature, these values can be defined:

Field Type Description
weight float or vpTemporalWeighting The importance given by the user to this feature type.
display boolean Whether the features should be displayed when calling vpRBTracker::display().

For weighting, it can be of interest for certain features to give them more importance at the end of optimization, when the pose should be close to the correct solution. Indeed, certain features such as Silhouette contours or depth features exhibit a limited convergence domain (depending on their settings) but do not suffer from modelling inaccuracies, like keypoints. As such, they are ideal to correct the small remaining error that cannot be removed by keypoints.

To do so, you can use a sigmoid-like weighting scheme for such features. This scheme modifies the weight during the optimization (called every frame). The configuration is the following:

Field Type Description
minWeight float The minimum weight given to the feature (either at the start or end of optimization, depending on the increasing flag)
maxWeight float The maximum weight given to the feature (either at the start or end of optimization, depending on the increasing flag)
midpointLocation float The optimization point where the weight reaches the halfpoint between minWeight and maxWeight. It is specified as a fraction of the number of iterations (between 0 and 1)
slopePower int How fast the weighting transitions from minWeight to maxWeight
increasing boolean Whether the weight should increase (go from minWeight to maxWeight) or decrease (maxWeight to minWeight)
Different values for temporal weighting.

Example configurations:

For a features whose importance does not change during optimization

{
"weight": 1.0,
"display": false
}

For a feature that is more important at the end of optimization (A feature that has a more local convergence domain)

{
"weight": {
"minWeight": 0.0,
"maxWeight": 1.0,
"midpointLocation": 0.5,
"slopePower": 4,
"increasing": true
},
"display": false
}

And accordingly, a feature whose importance decreases (such as features that have a large convergence domain but that may be inaccurate):

{
"weight": {
"minWeight": 0.0,
"maxWeight": 1.0,
"midpointLocation": 0.5,
"slopePower": 4,
"increasing": false
},
"display": false
}

4.2.7.1. vpRBKltTracker

Warning
This feature requires that ViSP has been built with OpenCV as a 3rd party (see OpenCV installation instructions). Otherwise, loading this JSON configuration will raise an error about encountering an unknown tracker.

Example configuration:

{
"type": "klt",
"weight": 1,
"useMask": true,
"minMaskConfidence": 0.5,
"maxReprojectionErrorPixels": 5.0,
"newPointsMinPixelDistance": 4,
"minimumNumPoints": 20,
"blockSize": 5,
"useHarris": true,
"harris": 0.05,
"maxFeatures": 500,
"minDistance": 5.0,
"pyramidLevels": 3,
"quality": 0.01,
"windowSize": 5
}
Field Type Description
useMask boolean Whether, if available, the segmentation mask should be used to filter wrong features.
minMaskConfidence float If useMask is true, minimum confidence for a feature to be accepted.
maxReprojectionErrorPixels float Maximum reprojection error, above which a tracked KLT point is rejected for future tracking iterations.
newPointsMinPixelDistance float During KLT resampling, minimum euclidean distance that a point should have with all other to be considered for tracking.
minimumNumPoints int Minimum number of points that should be tracked at a given timestep. If the number of points is below, then KLT tracking is fully reinitialized.
Note
For the other settings, see the vpKltOpenCV documentation.

4.2.7.2. vpRBSilhouetteMeTracker

Example configuration:

{
"type": "silhouetteMe",
"weight": 1,
"numCandidates": 3,
"useMask": false,
"minMaskConfidence": 0.8,
"movingEdge": {
"maskSize": 7,
"minSampleStep": 1.0,
"mu": [
0.5,
0.5
],
"nMask": 180,
"range": 16,
"sampleStep": 1.0,
"strip": 2,
"thresholdType": "normalized",
"threshold": 20
}
}
Field Type Description
numCandidates int Number of edge locations that are preserved as potential matches for a given silhouette point.
useMask boolean Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background).
minMaskConfidence float If useMask is true, minimum confidence disparity for a feature to be accepted.
movingEdge dictionary Moving edge tracker settings. See vpMe for more information.

4.2.7.3. vpRBSilhouetteCCDTracker

Example configuration for a case with small motions:

{
"type": "silhouetteColor",
"weight": 0.1,
"useMask": true,
"minMaskConfidence": 0.8,
"ccd": {
"h": 16,
"delta_h": 1,
"min_h": 16
}
}

Example configuration to handle larger motions:

{
"type": "silhouetteColor",
"weight": 0.1,
"useMask": true,
"minMaskConfidence": 0.8,
"ccd": {
"h": 64,
"delta_h": 4,
"min_h": 16
}
}
Field Type Description
useMask boolean Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background).
minMaskConfidence float If useMask is true, minimum confidence disparity for a feature to be accepted.
maxNumPoints int Resample number of points to use for this tracker. This allows to use many points for the ME-based tracker, while retaining fewer points for this tracker (it is more expensive). Leave to 0 to not perform resampling.
temporalSmoothing float Interpolation factor, between 0 and 1. This allows to blend a contour point's current color statistics with those computed on the previous frame. This may improve stability, but may also lead to smaller motion.
ccd dictionary Contracting curve density algorithm settings. See below.

To configure the actual CCD parameters, you can modify:

Field Type Description
h int The initial contour search area. This size (in pixels) is used to define the normal size that is used to compute the color statistics and will strongly impact the convergence domain of the tracker.
delta_h int Subsampling step when examining the contour normal. This helps diminish the tracking time when h is large. Using a high value (comparatively to that of h) can lead to instabilities.
min_h int Parameter to set if you wish to use an adaptive scaling for the silhouette search area. If set, the initial search area will be set to h. Then, at each optimization iteration, if motion is smaller than a multiple of h, then h is halved, down to a minimum value of min_h. Note that if delta_h is set, then it is also halved in order to obtain more reliable statistics.

4.2.7.4. vpRBDenseDepthTracker

Example configuration:

{
"type": "depth",
"weight": 1,
"useMask": true,
"minMaskConfidence": 0.5,
"step": 2,
"display": false
},
Field Type Description
useMask boolean Whether, if available, the segmentation mask should be used to filter wrong features.
minMaskConfidence float If useMask is true, minimum confidence for a feature to be accepted.
step int Step in pixels to sample the render map. A low value for this parameter will lead to a larger number of depth features. As this step is used when iterating on rows and columns of the render images, the number of depth features will quickly decrease as the step grows.

4.3 Python specific configuration

Warning
The features defined below are only available when using the Python bindings. To install and use the bindings see Tutorial: Building ViSP Python bindings from source.

When using the Python bindings, more options are available. These are defined in modules/python/bindings/visp/python/rbt

You can load Python features from a JSON configuration. To do so, you can place the Python configuration in a JSON object under the "python_ext" key.

Your config file should look like:

{
"features": [ // CPP features
...
],
...,
"python_ext": {
"features": [
...
],
"odometry": {
...
}
}
}

For more information see modules/python/bindings/visp/python/rbt/PythonRBExtensions.py

An example of RBT using Python is available under modules/python/examples/realsense-rbt.py

4.3.1. XFeat-based options

We provide a wrapper around the XFeat keypoints. XFeat keypoints are robust, and allow for reliable matching, even in the presence of large motion blur and image difference. They are far more powerful than the KLT based features.

XFeat keypoints are matched with 3D points that are stored in a map (see vpPointMap). This map either contains objects points (expressed in the object's frame) or in the environment frame. When a new image arrives, keypoints are extracted, then matched with the points in the map. The correspondences can then be used to minimize a visual error to estimate the object or camera motion. The visual error can be 2D (classical IBVS) or in 3D (Point minimisation in metric space) After minimization incorrect matches are detected and removed from the 3D point map. New points are added.

Warning
As XFeat is based on deep learning, having a GPU is preferable. Running XFeat on the CPU is still possible

We run XFeat on the whole image, and the keypoints can be used both for the object tracking and camera motion estimation via visual odometry. To use XFeat-backed settings, the Python settings should contain the XFeat feature extraction settings.

"xfeat": {
"numPoints": 8192,
"minCos": 0.95,
"useDense": false,
"onlyOnBB": false,
"minObjMaskValue": 0.5,
"minSilhouetteDistanceEnvPoint": 20.0,
"minSilhouetteDistanceObjectPoint": 1.0
},
Field Type Description
numPoints int The maximum number of points that are extracted in the frame
minCos float The minimum matching score for two keypoints to be matched (extract from two different images). Use a high value (e.g. 0.9) to minimize the number of matches.
useDense boolean Whether to use dense keypoint extraction and matching (Xfeat*)
onlyOnBB boolean Wheter to run keypoint extraction only in the object region in the image. Do not use this option if you wish to use odometry.
minObjMaskValue float Minimum mask value (if mask is enabled) for a keypoint to be considered as belonging to the object. If the mask value is below, it is considered an environment keypoints (that may be used for odometry).
minSilhouetteDistanceEnvPoint float Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint , that can be used for odometry. Useful to reject keypoints that could potentially belong to the object in scenarios where large motions happen
minSilhouetteDistanceObjectPoint float Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint, that can be used to estimate object motion. Useful to only consider keypoints that are in the texture part of the object and not its contour, where keypoints may rely on the background and thus provide unreliable matches.

To enable XFeat feature for object tracking, you should add the following to the "python_ext" object configuration:

"features": [
{
"type": "xfeat",
"weight": 1.0,
"use_3d": false,
"display": true,
"numPoints": 8192,
"reprojectionThreshold": 5,
"minDistNewPoint": 0.0,
"maxDepthErrorVisible": 0.01,
"maxDepthErrorCandidate": 0.005
}
]
Field Type Description
use_3d boolean Whether to use 2D error criterion or 3D (requires RGBD camera)
numPoints int The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints.
reprojectionThreshold float The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map.
minDistNewPoint float Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition.
maxDepthErrorVisible float Maximum depth error (in meters) between the rendered depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Mainly used to filter self occlusion.
maxDepthErrorCandidate float Maximum depth error (in meters) before a new keypoint addition is rejected. Used to filter points that do not belong to the object or for which the actual 3D error is too great.

To perform odometry using XFeat, you can add these settings:

"odometry": {
"type": "xfeat",
"use3d": false,
"numPoints": 8192,
"reprojectionThreshold": 10.0,
"minDistNewPoint": 0.0,
"maxDepthErrorVisible": 0.02,
"maxDepthErrorCandidate": 0.0,
"gain": 0.5,
"maxNumIters": 20,
"muInit": 0.0,
"muIterFactor": 0.1,
"minImprovementFactor": 0.001
}
Field Type Description
use_3d boolean Whether to use 2D error criterion or 3D
numPoints int The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints.
reprojectionThreshold float The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map.
minDistNewPoint float Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition.
maxDepthErrorVisible float Maximum depth error (in meters) between the actual camera depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Used to filter occlusions.
maxDepthErrorCandidate float Maximum depth error (in meters) before a new keypoint addition is rejected.
Warning
XFeat odometry requires an RGBD camera.

5. Tracker usage

5.1. On prerecorded sequences

In order to test the tracker, we provide a prerecorded sequence of images named dragon.mp4 and the dragon object, whose CAD model file dragon.bam is provided in the ViSP source code.

The result of tracking the dragon can be obtained by going to the build folder:

~/visp-build $ cd tutorial/tracking/render-based

then launching:

$ ./tutorial-rbt-sequence --color data/dragon/dragon.mp4 --tracker data/dragon/dragon.json

Note that:

More explanation is provided in the next section.

5.2. With a realsense camera

We will now focus on an other example usage of the tracker, using a Realsense camera.

Note that:

To run the program, start by placing yourself in the ViSP build directory. Assuming you have compiled and built the tutorials, navigate to the tutorial folder:

~/visp-build $ cd tutorial/tracking/render-based

You can then examine the command line arguments with:

~/visp-build/tutorial-tracking/render-based $ ./tutorial-rbt-realsense -h

Which outputs:

Program description: Tutorial showing the usage of the Render-Based tracker with a RealSense camera
Arguments:
 --config Path to the JSON configuration file. Values in this files are loaded, and can be overridden by command line arguments.
 Optional

 --debug-display Enable additional displays from the renderer
 Default: false

 --fps Realsense requested framerate
 Default: 60
 Optional

 --height Realsense requested image height
 Default: 480
 Optional

 --init-file Path to the JSON file containing the 2D/3D correspondences for initialization by click
 Default: ""
 Optional

 --max-depth-display Maximum depth value, used to scale the depth display
 Default: 1.0
 Optional

 --no-display Disable display windows
 Default: true

 --object Name of the object to track. Used to potentially fetch the init file
 Default: ""
 Optional

 --plot-cov Plot the pose covariance trace for each feature
 Default: false

 --plot-divergence Plot the metrics associated to the divergence threshold computation
 Default: false

 --plot-pose Plot the pose of the object in the camera frame
 Default: false

 --plot-position Plot the position of the object in a 3d figure
 Default: false

 --pose Initial pose of the object in the camera frame.
 Default: []
 Optional

 --profile Enable the use of Pstats to profile rendering times
 Default: false

 --save Whether to save experiment data
 Default: false

 --save-path Where to save the experiment log. The folder should not exist.
 Default: ""
 Optional

 --save-video Whether to save the video
 Default: false

 --tracker Path to the JSON file containing the tracker
 Default: ""
 Optional

 --video-framerate Output video framerate
 Default: 30
 Optional

 --width Realsense requested image width
 Default: 848
 Optional

Example JSON configuration file:

{
 "--debug-display": false,
 "--fps": 60,
 "--height": 480,
 "--init-file": "",
 "--max-depth-display": 1.0,
 "--no-display": true,
 "--object": "",
 "--plot-cov": false,
 "--plot-divergence": false,
 "--plot-pose": false,
 "--plot-position": false,
 "--pose": [],
 "--profile": false,
 "--save": false,
 "--save-path": "",
 "--save-video": false,
 "--tracker": "",
 "--video-framerate": 30,
 "--width": 848
}

You can see that among those arguments, the script has two main parameters. The first, --tracker is the path to the .json configuration file, defined following the details given in 4. Configuration section. The second --object, is the path to the 3D model (in a format readable by the Panda3D engine). Alternatively, it can be specified in the .json file.

Here, we will run the tutorial with the following arguments

~/visp-build/tutorial-tracking/render-based $ ./tutorial-rbt-realsense \
 --tracker data/dragon/dragon_realsense.json \
 --object data/dragon/dragon.bam \
 --init-file data/dragon/dragon.init

The file dragon_realsense.json contains the following configuration:

{
"vvs": {
"gain": 1.0,
"maxIterations": 10,
"mu": 0.0,
"muIterFactor": 0.1,
"convergenceThreshold": 0.001
},
"updateRenderThreshold": 0.001,
"model": "data/dragon/dragon.bam",
"silhouetteExtractionSettings": {
"threshold": {
"type": "relative",
"value": 0.1
},
"sampling": {
"samplingRate": 1,
"numPoints": 512,
"reusePreviousPoints": true
}
},
"mask": {
"type": "histogram",
"bins": 32,
"objectUpdateRate": 0.1,
"backgroundUpdateRate": 0.1,
"maxDepthError": 0.01,
"computeOnlyOnBoundingBox": false
},
"drift": {
"type": "probabilistic",
"colorUpdateRate": 0.25,
"initialColorSigma": 25.0,
"depthSigma": 0.025,
"filteringMaxDistance": 0.001,
"minDistanceNewPoints": 0.005
},
"features": [
{
"type": "silhouetteColor",
"weight": 0.1,
"useMask": true,
"minMaskConfidence": 0.8,
"ccd": {
"h": 64,
"delta_h": 4,
"min_h": 16
}
},
{
"type": "depth",
"weight": 1,
"display": false,
"step": 2,
"useMask": true,
"minMaskConfidence": 0.8
},
{
"type": "klt",
"weight": 1,
"useMask": true,
"minMaskConfidence": 0.5,
"maxReprojectionErrorPixels": 5.0,
"newPointsMinPixelDistance": 4,
"minimumNumPoints": 20,
"blockSize": 5,
"useHarris": true,
"harris": 0.05,
"maxFeatures": 500,
"minDistance": 5.0,
"pyramidLevels": 3,
"quality": 0.01,
"windowSize": 5
}
]
}
Warning
Note that to perform your own tests, you will have to rebuild the tutorial-rbt-realsense every time you make a change to the .json file located in the source directory so that the changes are reflected in the build directory

Let's now have a look at the tutorial code that will perform the tracking. We recall that the full code is given in tutorial-rbt-realsense.cpp.

The code first starts by parsing the command line arguments, detailed above, using structures defined in render-based-tutorial-utils.h:

// Read the command line options
vpRBTrackerTutorial::BaseArguments baseArgs;
CmdArguments realsenseArgs;
vpRBTrackerTutorial::vpRBExperimentLogger logger;
vpRBTrackerTutorial::vpRBExperimentPlotter plotter;
"Tutorial showing the usage of the Render-Based tracker with a RealSense camera",
"--config", "/"
);
baseArgs.registerArguments(parser);
realsenseArgs.registerArguments(parser);
logger.registerArguments(parser);
plotter.registerArguments(parser);
parser.parse(argc, argv);
baseArgs.postProcessArguments();
plotter.postProcessArguments(baseArgs.display);

We start by declaring an instance of the tracker and load the given JSON configuration file.

std::cout << "Loading tracker: " << baseArgs.trackerConfiguration << std::endl;
tracker.loadConfigurationFile(baseArgs.trackerConfiguration);
if (!baseArgs.modelPath.empty()) {
tracker.setModelPath(baseArgs.modelPath);
}

Depending on the configuration, the call to vpRBTracker::loadConfigurationFile() may raise an error. For instance, The KLT tracker requires OpenCV and if you did not compile ViSP with OpenCV as a third party, an error will be raised.

If successful, we then open a connection to the realsense camera, requesting a specific configuration. This configuration depends on the input arguments (depth, width and framerate)

const unsigned int width = realsenseArgs.width, height = realsenseArgs.height;
const unsigned fps = realsenseArgs.fps;
vpRealSense2 realsense;
std::cout << "Opening realsense with settings: " << width << "x" << height << " @ " << fps << "fps" << std::endl;
rs2::config config;
config.enable_stream(RS2_STREAM_COLOR, width, height, RS2_FORMAT_RGBA8, fps);
config.enable_stream(RS2_STREAM_DEPTH, width, height, RS2_FORMAT_Z16, fps);
rs2::align align_to(RS2_STREAM_COLOR);
try {
realsense.open(config);
}
catch (const vpException &e) {
std::cout << "Caught an exception: " << e.what() << std::endl;
std::cout << "Check if the Realsense camera is connected..." << std::endl;
return EXIT_FAILURE;
}
const float depthScale = realsense.getDepthScale(); // used to convert uint16_t to meters

If the configuration is available, we can proceed and update the tracker with a new, more relevant configuration. Indeed, while camera intrinsics and image resolution can be declared in the configuration file, using the intrinsics provided by the RealSense will be more accurate and ensures that there is no configuration mismatch. We then call vpRBTracker::startTracking() to initialize the renderer, load the 3D model and other setup related operations.

tracker.setCameraParameters(cam, height, width);
tracker.startTracking();
Warning
It is at this point that some errors related to an incorrect intrinsics configuration may be raised. Additionally, if you do not have an OpenGL-capable device or if Panda3D cannot open a window, an error may be raised. If so, please see 6. Troubleshooting issues.

We then proceed and declare the different images that will be needed by the tutorial. Some images contain raw information and will be fed to the tracker. Others will be used for display purposes.

vpImage<vpRGBa> Icol(height, width); // Color image
vpImage<unsigned char> Id(height, width); // Grayscale image, converted from Icol
vpImage<uint16_t> depthRaw(height, width); // Raw depth map, in realsense format
vpImage<float> depth(height, width); // Depth map, in meters
// Display versions of raw image data
vpImage<unsigned char> IdepthDisplay(height, width);
vpImage<unsigned char> IProbaDisplay(height, width);
vpImage<unsigned char> cannyDisplay(height, width);
vpImage<vpRGBa> InormDisplay(height, width);

We then create the different display windows with

if (baseArgs.display) {
displays = vpRBTrackerTutorial::createDisplays(Id, Icol, IdepthDisplay, IProbaDisplay);
if (baseArgs.debugDisplay) {
displaysDebug = vpDisplayFactory::makeDisplayGrid(1, 2,
0, 0,
20, 20,
"Normals in object frame", InormDisplay,
"Depth canny", cannyDisplay
);
}
plotter.init(displays);
}

Once the setup is finished, we wait on the user to setup is environment and position the object (here, the dragon) in front of the camera.

if (baseArgs.display && !baseArgs.hasInlineInit()) {
bool ready = false;
while (!ready) {
realsense.acquire((unsigned char *)Icol.bitmap, (unsigned char *)depthRaw.bitmap, nullptr, nullptr, &align_to);
updateDepth(depthRaw, depthScale, baseArgs.maxDepthDisplay, depth, IdepthDisplay);
if (vpDisplay::getClick(Id, false)) {
ready = true;
}
else {
vpTime::wait(1000.0 / fps);
}
}
}

When the user has clicked the display window, initialization can be performed.

// Manual initialization of the tracker
std::cout << "Starting init" << std::endl;
if (baseArgs.hasInlineInit()) {
tracker.setPose(baseArgs.cMoInit);
}
else if (baseArgs.display) {
tracker.initClick(Id, baseArgs.initFile, true);
tracker.getPose(cMo);
}
else {
throw vpException(vpException::notImplementedError, "Cannot initialize tracking: no initial pose provided or display to perform click initialization.");
}
std::cout << "Starting pose: " << vpPoseVector(cMo).t() << std::endl;

If the initial pose is given as a command line argument, tracking can directly proceed. Otherwise, The user must click on different pixel positions that correspond to known 3D points, defined in a initialization file located next to the configuration file or the mesh. In this case, the initialization file is located at data/dragon/dragon.init and contains:

4
-0.0476 0.0276 0.011 # tip of the tongue
0.00435 0.046124 -0.01048 # tip of the crest
0.053011 0.0229 0.002352 # Tip of the tail
0.02405 -0.023972 0.018323 # Leftmost of the back left foot

Once initialization has been performed and a starting pose given, we can enter the tracking loop. At every iteration, the same simple steps are performed:

We start by acquiring the most recent color and depth frames.

realsense.acquire((unsigned char *)Icol.bitmap, (unsigned char *)depthRaw.bitmap, nullptr, nullptr, &align_to);
updateDepth(depthRaw, depthScale, baseArgs.maxDepthDisplay, depth, IdepthDisplay);

The alignment parameter ensures that the depth map is aligned with the color frame, as required by the RBT. The call to updateDepth() converts from a raw uint16_t representation to a depth map in meters. We also convert The RGB image to a grayscale image using vpImageConvert::convert().

We can then call the tracker with the different frames:

double trackingStart = vpTime::measureTimeMs();
vpRBTrackingResult result = tracker.track(Id, Icol, depth);
double trackingEnd = vpTime::measureTimeMs();
tracker.getPose(cMo);

As an output, we get a vpRBTrackingResult object. This object contains information about what the tracker has done, the number of features that were used, etc. It contains one important information, which is the reason for why the tracking call has returned. In the following snippet:

switch (result.getStoppingReason()) {
case vpRBTrackingStoppingReason::EXCEPTION:
{
std::cout << "Encountered an exception during tracking, pose was not updated!" << std::endl;
break;
}
case vpRBTrackingStoppingReason::NOT_ENOUGH_FEATURES:
{
std::cout << "There were not enough feature to perform tracking!" << std::endl;
break;
}
case vpRBTrackingStoppingReason::OBJECT_NOT_IN_IMAGE:
{
std::cout << "Object is not in image!" << std::endl;
break;
}
case vpRBTrackingStoppingReason::CONVERGENCE_CRITERION:
{
std::cout << "Convergence criterion reached:" << std::endl;
std::cout << "- Num iterations: " << result.getNumIterations() << std::endl;
std::cout << "- Convergence criterion: " << *(result.getConvergenceMetricValues().end() - 1) << std::endl;
break;
}
case vpRBTrackingStoppingReason::MAX_ITERS:
{
break;
}
default:
{ }
}
const std::shared_ptr<vpRBDriftDetector> driftDetector = tracker.getDriftDetector();
if (driftDetector) {
if (driftDetector->getScore() < 0.25) {
std::cout << "Drift detection has low confidence score: " << driftDetector->getScore() << std::endl;
}
}

We use this information to check whether:

  • There was an exception
  • The object is present in the image (if not, no information can be extracted from the render): reinitialization should be performed
  • There are enough features to compute pose updates.
  • If the tracking has converged to a given pose (i.e., motion during optimization has become small enough to be considered as noise).
  • Or if the tracking has used the maximum allocated number of optimization iterations.

In the first three cases, this means that tracking should be reinitialized.

After that, we check (if it has been computed) the tracking confidence score. If it is low, we emit a warning. In a more complete application, the pipeline would contain an automatic reinitialization procedure.

We then display the current frame and overlay the tracking information

if (baseArgs.display) {
if (baseArgs.debugDisplay) {
const vpRBFeatureTrackerInput &lastFrame = tracker.getMostRecentFrame();
vpRBTrackerTutorial::displayCanny(lastFrame.renders.silhouetteCanny, cannyDisplay, lastFrame.renders.isSilhouette);
}
tracker.display(Id, Icol, IdepthDisplay);
vpDisplay::displayFrame(Icol, cMo, cam, 0.05, vpColor::none, 2);
vpDisplay::displayText(Id, 20, 5, "Right click to exit", vpColor::red);
if (driftDetector) {
std::stringstream ss;
ss << "Confidence score: " << std::setprecision(2) << driftDetector->getScore() << std::endl;
vpDisplay::displayText(Id, Id.getHeight() - 40, 5, ss.str(), vpColor::red);
}
if (vpDisplay::getClick(Id, button, false)) {
if (button == vpMouseButton::button3) {
break;
}
}
tracker.displayMask(IProbaDisplay);
vpDisplay::display(IProbaDisplay);
vpDisplay::flush(IdepthDisplay); vpDisplay::flush(IProbaDisplay);
}

On the grayscale image, you can see:

  • If enabled, the moving edge contours. Valid edges are seen in green, while suppressed ones are purple.
  • If enabled, the KLT tracked points, represented as red dots on the object.

On the color image, you can see:

  • The pose of the object in the camera frame, represented as a 3-axis basis
  • If enabled, the contours used by the CCD tracker.

The display also contains the depth image:

  • By default, depth display is disabled as it incurs a large performance penalty and may reduce framerate. You can change the tracker configuration to enable by adding "display": true to the "depth" feature object.

Finally, you should also see a black and white image which is the pixel wise segmentation mask that is used to filter out the features. If no segmentation method is used, the image should be black.

Example displays when tracking with the realsense camera and all features enabled.

We also log in the console timing details. Running the tutorial with the –verbose flag will print timings for each part of the RBT pipeline.

logger.logFrame(tracker, iter, Id, Icol, IdepthDisplay, IProbaDisplay);
std::cout << "Iter " << iter << ": " << round(frameEnd - frameStart) << "ms" << std::endl;
std::cout << "- Tracking: " << round(trackingEnd - trackingStart) << "ms" << std::endl;
std::cout << "- Display: " << round(displayEnd - displayStart) << "ms" << std::endl;
if (baseArgs.verbose) {
std::cout << result.timer() << std::endl;
}
plotter.plot(tracker, (frameEnd - expStart) / 1000.0);

When the user has pressed left click, we exit the tracking loop and perform a small cleanup:

logger.close();
return EXIT_SUCCESS;

6. Troubleshooting issues

6.1. RBT performance are bad and cores are not used

One possible reason for this is that we make large use of OpenMP to parallelize and split the workload. An invalid OpenMP configuration may lead to a substantial loss in performance.

In Ubuntu 24.04, we have observed with the GNU OpenMP implementation (that is used if you compile with gcc/g++) that the default settings lead to a slow down by a factor of 4/5.

This is due to an active waiting policy. We recommend using the following OpenMP settings:

$ export GOMP_SPINCOUNT=0

Note that this configuration will only be kept for the lifetime of your terminal and you may have to restore it the next time you start the tracker.

6.2. Cannot open window (hidden or visible)

This error can appear when calling vpRBTracker::startTracking().

It may be useful to check the backend used by Panda3D, which should be displayed when you start the tutorial:

Known pipe types:
 glxGraphicsPipe

You can also additionally get more information bout your OpenGL support. In ubuntu, you can use:

glxinfo | grep version
server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
OpenGL core profile version string: 4.6.0 NVIDIA 550.144.03
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL version string: 4.6.0 NVIDIA 550.144.03
OpenGL shading language version string: 4.60 NVIDIA
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.144.03
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
 GL_EXT_shader_group_vote, GL_EXT_shader_implicit_conversions,

If you do not have something similar, it may be that your platform has no OpenGL support.

Warning
It is possible that your hardware has OpenGL support, but does not support headless rendering, i.e., rendering without a screen plugged to the GPU. In this case, you should either try to use the p3headlessgl backend or use a virtual screen emulator.
  • To use p3headlessgl, you should first visit this Panda3D documentation link. Then you should edit the Config.prc file to set "load-display" to "p3headlessgl". In Ubuntu, the Config.prc is located in /etc/Config.prc. In Conda, it will be in $CONDA_PREFIX/etc/Config.prc.
  • In Linux distributions, you can use xvfb to emulate a x11 server to render to.
Note that the RBT uses shaders that are compiled for OpenGL core 3.3.

7. Extending the RBT

Warning
This section is under construction and will be improved in the future!

7.1. Defining your own features

7.2. Registering your own component for JSON parsing

As seen in 4. Configuration, the different components of the RBT pipeline can be parsed from a JSON configuration file. This requires a ViSP version that is built with the nlohmann::json third party (see 2.1. Building the RBT module).

This functionality is implemented using a factory pattern as defined in the vpDynamicFactory.

For your own class to be loadable, you should call vpDynamicFactory::registerType with a lambda function that defines how a JSON object can be used to configure an object from your type

For instance, in the vpRBDriftDetectorFactory, an object of type vpRBProbabilistic3DDriftDetector is registered as:

registerType("probabilistic", [](const nlohmann::json &j) {
std::shared_ptr<vpRBProbabilistic3DDriftDetector> p(new vpRBProbabilistic3DDriftDetector());
p->loadJsonConfiguration(j);
return p;
});
Algorithm that uses tracks object surface points in order to estimate the probability that tracking i...

This means that when parsing the JSON object's drift field, if the "type" field is equal to "probabilistic", then this function will be called and the json object will used to build a vpRBProbabilistic3DDriftDetector.

This patterns hold for the different types of components: