The secret weapon against AI’s biggest weakness
Mantis Biotech is advancing a new approach to one of artificial intelligence’s most persistent challenges – the lack of high-quality data in complex, real-world domains. By combining large language models with physics-based simulation, the company is building “digital twins” of humans – virtual, predictive models that replicate anatomy, physiology, and behavior.
While modern AI systems have achieved significant progress across fields such as genomics, diagnostics, and drug discovery, their performance depends heavily on access to large, well-labeled datasets. In many critical areas – including rare diseases, specialized medical conditions, and emerging scientific domains – such data is limited, fragmented, or inaccessible due to privacy and regulatory constraints.
Mantis Biotech’s platform addresses this constraint by reframing data generation as a simulation problem. Instead of relying solely on collected datasets, the system uses known physical laws to expand small sets of observations into large, structured, and causally consistent training data. A single observed state can be treated as an initial condition, which is then evolved through physics-based models to generate a wide range of plausible scenarios.
The system operates through a multi-stage pipeline that integrates automated data collection, AI-driven ingestion of multi-modal inputs, LLM-based orchestration, domain-specific physics simulation, and high-fidelity rendering. The result is synthetic data that is not only realistic in appearance but also grounded in the underlying mechanics of the domain, enabling more reliable training of machine learning models.
This approach differs fundamentally from traditional data augmentation techniques, which typically rely on surface-level transformations such as rotation, cropping, or color adjustments. It also diverges from generative models like GANs or diffusion systems, which can produce realistic outputs but lack guarantees of physical consistency. By contrast, Mantis generates entirely new states that reflect real-world dynamics, with labels derived directly from the simulation process.
In practice, scaling and operationalizing such synthetic data often requires additional infrastructure. QuData’s service can further support synthetic data generation by providing efficient, scalable pipelines for creating high-quality artificial datasets from simulated environments or statistical models. The QuData team specializes in dataset design, consistent annotation and labeling, quality validation, bias mitigation, and seamless integration of synthetic data with real-world datasets.
Mantis Biotech has already applied its technology in professional sports, where digital twins of athletes are used to model performance over time and predict potential injuries. By integrating data such as motion capture, training load, and biometric signals, the system enables detailed longitudinal analysis of physical behavior.
Beyond sports, the potential applications span a wide range of industries, including healthcare, robotics, epidemiology, and scientific research. The platform is designed to be modular and domain-agnostic, supporting multiple data types such as images, video, audio, text, and structured inputs. This flexibility allows it to operate across environments where traditional machine learning approaches struggle due to limited or heterogeneous data.
Mantis’s framework is also complementary to existing AI methodologies. Rather than modifying model architectures or training objectives, it focuses on improving the quality and quantity of data itself. The resulting datasets can be used with standard supervised learning techniques or combined with other approaches, including physics-informed models and few-shot learning systems.
The company's future plans include broader validation across domains such as spatiotemporal prediction and pose estimation, as well as extending the system to additional classes of physics simulations.