Pipe defects synthetic data generator

Synthetic data generator used for acquiring labeled images

The task of the project:
Our task was to develop a tool for generation of artificial labeled images (which looked like bitmasks and bounding boxes), which would be used as datasets for training deep convolutional architectures (neural networks). The aim of the project was detecting different geometric and visual pipe defects for training neural networks to locate and define pipe defects on the images.

Specifics of the project:
The usage of synthetic data helps to improve the quality of neural network and machine learning algorithm performance, reduces the distortion level, and diminishes the size of the required real data. This allows to save the client’s money and time.
As the generator can perceive 3D data, it can become a source of new data not only for traditional computer vision (CV), but it could also be used for a series of geometric computer vision tasks (a.k.a. 3D ML, GDL). For example, apart from RGB cameras, spatial scanners/depth cameras (RGB-D) are capable of detecting defects invisible to the human eye and reconstructing the analyzed objects.

What we did and how we did it:
We designed a curve generator for Blender 3D using Python, the curves producing general surface plots. We realized the generation of realistic pipes models alongside the curves on a dimensional scene. For each type of pipes and defects there is photorealistic and procedural materials that can be easily adjusted either from the initiation script or from the scene settings. We also set the digital twins of the potential cameras and on-camera light, which are installed on the drones, used for recording real objects. Black and white images, in which the damaged areas are shown, are created together with the photorealistic ones (they are needed for mapping generation).
Using the Rust programming language, we realized an algorithm for packaged conversion of mapped black and white images into YOLO mapping (text files with coordinates and bounding box size).
With the help of the generator we got thousands of different mapped images, which served as a supp;ement to the real data.

Time frame: 3 weeks

Project team
RnD and CG: Roman Chumak