We were approached by a client in 2020 to help them create a digital twin of their robotic research facility in San Jose.
In order to gather the necessary data to create the digital twin, we captured a point cloud of the space.
The client researches many different kinds of robots, including AMR (Automated Mobile Robots) for moving goods around in a facility, as well as stationary robot arms for doing pick and place operations.
The facility includes some office space and cubicles, but is mostly made up of floor space for robot arms and AMR’s and some shelving to allow AMR’s to fetch and place products on shelves. We wanted to capture the entire facility.
The main purpose of the digital twin is primarily for the client to research computer vision and machine learning methods for training robots, so the goal was to produce a high fidelity version of the digital twin using the Unity HDRP (High Definition Rendering Pipeline).
We also wanted to produce a lower fidelity version that could be used in AltspaceVR or in a stand alone mobile app.
More Point Cloud Data Is Crucial
In order to create a digital twin, you need measurements.
You need spatial measurements so you can recreate the geometry in 3D, and you need visual references so you can create the texture maps needed to give the 3D model color.
The more measurements you take, the more accurate you can make your model.
So a team was sent to the facility with a Geoslam ZEB Go device, which was used to capture a point cloud scan of the space. The device also captured a simultaneous video that is synched to the point cloud capture.

Additional reference images and video were captured using cell phones. We also had blueprints of the facility, and models for the robot arms were provided by their respective manufacturers via URDF files or other CAD models.
Five point clouds were captured using the Geoslam device, totalling 1.5 GB of point cloud data. In addition to capturing point clouds, videos and still images of the facility, our team also took measurements of some of the more critical items using a tape measure.

The team spent about six hours prepping for and capturing the space. When combined and synched with the videos captured of the same areas, there was over 196 GB of data total. Four or five hours were spent combining the data and managing it after the initial scanning session.
Too Much Point Cloud Data Can Be Bad
While having a lot of data is great in terms of being able to produce an accurate model, it’s not so good when it comes to creating a simulation that can perform in real time. In order to provide a proper simulation environment, we needed to turn the point clouds and reference images into an accurate mesh with photorealistic materials so it would be usable for machine learning with computer vision.
We also needed to create accurate collision models for all of the scanned geometry. Meshes are a very poor choice when it comes to collision models, and in most simulations, the collision models are more important than the visual model. Game engines performance is determined more by the physics than by the rendering, and using meshes as collision models can dramatically affect performance. The thousands of polygons generated for a floor increases the burden on the physics engine by a huge factor.
We also probably want to do some segmentation on the data.

Segmentation means figuring out which parts of the data correspond to which ‘feature’.
It’s common practice to differentiate between which points in the point cloud are the floor, which are the ceiling, walls, windows and door etc.
Not only do we want to identify them, but in some cases we want to remove them.
It’s unlikely we’ll be able to scan most places without any people in them and we’re going to need to remove people who happened to walk across the room when we were scanning.
There are also a lot of things which we might want to remove so we can replace them later with better models from elsewhere, and also take advantage of repetition.
If we have a bunch of identical shelves, it’s better to model a single shelf and use that model for each instance, than to have each one use unique polygons in an automated model.
Too Much Point Cloud Data Is Ugly
Unfortunately, automated methods to produce meshes from point clouds do not produce accurate enough results for our clients purposes.
Meshes produced from point clouds don’t have crisp edges and suffer from a ‘melted wax’ effect. Meshes generated from point clouds also don’t have accurate texture maps.
And unless the crew brings the equipment into every nook and cranny, point clouds can’t see behind or under things so the geometry there can only be guessed at.
They also have a lot more polygons than are really needed. Consider a floor for example. A floor could be modeled using two triangles if it is truly flat.
A floor generated by a point cloud scan however consists of thousands of polygons. This is very bad for the physics engine!
Photogrammetry is another method of creating meshes from images, but many thousands of images are needed, a lot of computer processing time is needed, and results still suffer from the same inaccuracies as a point cloud.
How To Turn Point Cloud Scans Into Digital Twins
The key to turning the point cloud scans and image references into a digital twin turns out to be old school game level design skills.
An experienced tech artist was given the point clouds, images and reference material and asked to use them to produce the 3D model.
They used a tool called CloudCompare to combine the five different point cloud scans into a single scan that they exported into a format compatible with the 3D modeling tool Blender.
In Blender, they used the point cloud scan as a reference guide to create a 3D model of the facility using traditional CAD tools.
Because the reference model was captured using a highly accurate LIDAR based device, the resulting digital twin was also very accurate, but because the digital twin was created by hand, it is much more efficient for a physics engine to deal with than a mesh created using current automated point cloud to mesh tools.
Our tech artist was also able to assign the appropriate materials to each surface as they built the model – something which most automated point cloud conversion processes do very poorly, if at all. Our artist could select from dozens of common materials such as metal, plastic, glossy paint, matte paint, stucco, carpet, etc. and get a physically accurate visual rendering of that kind of surface.
Another thing that our artist was able to do that would be very difficult for an automated process was to create LOD (Level of Detail) versions of everything that was scanned. These are versions of the model with fewer polygons that are used when viewing from a greater distance to improve performance. Without them, the scene would take five to ten times as long to render and the frame rate would go down dramatically.
Conclusion
Point clouds are a valuable tool for helping to create digital twins of building interiors but they can’t do the job alone. Reference photos and videos are also helpful, but even using photogrammetry will not suffice to create an accurate digital twin that can be used for robotic simulations.
Consider a desk for example. We were recently asked to model a desk for simulation purposes and there is no way that a digital scan could produce a model that could be used in a ROS environment. The desk had several drawers, and a scan of the desk would never show the interior of the drawers or reveal how they functioned.
In order for the desk to work in a simulation it need to be modeled in separate pieces for each of the moving parts, and built into a hierarchy that allows the drawer to slide in an out, but not up and down or side to side.
The drawer needs to have limits set on how far in or out it can slide, how much friction is involved, how much it weighs and how the weight is distributed. The collision model needs to be accurate for the draw so a simulated robot can put something into it or take something out.
That’s why for the foreseeable future, a skilled human will need to be in the loop to create useful digital twins.