Mitigating pitfalls of AI training with diverse digital humans

By Steve Harris, CEO of Mindtech.

Friday, 25th August 2023 Posted 2 years ago in by Phil Alsop

The term ‘digital human’ may be met with some confusion and caution. With the advancement of AI, it may already feel as though we live intimately with human-like technology. But digital humans are a disruptive solution for organisations looking to address the implications biassed datasets can have on the training of AI systems.

Humans are incredibly diverse and so, a lack of representation in training data can severely hinder the performance of computer vision systems. Predominantly representing a particular demographic, whether that be gender, skin colour, or any other visual feature, can result in models failing to recognise individuals beyond these groups. Therefore, biassed predictions and discriminatory outcomes can be expected when the model is deployed in real-world applications and while this is increasingly being recognised as an important issue, simply collecting more data is not as straightforward as it may seem. Companies are seeking availability, scalability and affordability, which real world data alone struggles to provide.

What is a digital human?

Digital humans are computer-generated human-like virtual beings that can be utilised in computer generated scenarios to emulate real-world behaviours and appearances. It is worth noting that digital humans are distinct from Avatars which are purely a physical representation of a ‘character’ in a virtual world. There is no requirement for an Avatar to look like a human (think of the characters in the namesake movie for example) whereas digital humans are designed to be as photo-real and indistinguishable from actual humans as possible. They may also be designed for the purpose of mimicking human behaviours and movements, in which case, they need to be fully articulated and exhibit similar limb and joint movements to humans (compare with commonly used static digital humans used for example by architecture firms that are simply posed in a single position to give the ‘artists impression’).

Digital Humans for Computer Vision Training AKA Synthetic Data

Through addressing the limitations of real world data, digital humans are offering a new way for training computer vision systems. And they are here to stay. Digital humans offer a scalable solution to redefining the way in which AI computer vision systems are trained and tested and their business value should not be underestimated. In fact, the global market for digital humans is expected to grow from $10 billion in 2020 to about $530 billion in 2030. There is no doubt that digital humans are going to change the way AI systems operate for the better. But, how do they manage to avoid the pitfalls of unrepresentative datasets?

Navigating the privacy pitfalls

Privacy concerns have long plagued data practices. Digital humans can be synthetically generated using computer simulations to create the characteristic human-like features and behaviours. The alternative often involves collecting and using personal data, such as photographs, videos, and voice recordings, all of which can be susceptible to privacy breaches.

Therefore, digital humans have the potential to solve these growing security concerns. By eliminating the need for personal data, AI models can be trained on datasets that do not rely on any personally identifiable information and thus, do not pose these same privacy risks, whilst still performing accurately.

Mindtech’s Digital Humans

Mindtech’s Chameleon platform has a unique and advanced implementation of digital Humans. Using Mindtech’s ‘configurable DNA technology’, gives the ability to scale much more easily than when relying on real world data alone. Synthetic data can be generated in large volumes and annotated quickly - steps that are otherwise time consuming and labour intensive. Streamlining this process is particularly useful in cases where real-word data can be scarce. This means that training times are much faster, in addition to consequent cost-savings.

Diversity in AI

As mentioned, the versatile nature of digital humans means they can be created for a number of different use cases, including scenarios in which there is potential for biases or false representations of society to be perpetuated. These can be a source of unwelcome misinformation and discrimination. Comprehensive, representative datasets can reduce the need for human intervention and ensure a huge range of diversity is available to test a model against to ensure it is not over-fitting to a particular look or style of person. Therefore, we should be ensuring that AI is reflective of all users and that we do not translate biases and stereotypes to these entities.

Traditional data collection methods often result in limited data sets, which can lead to datasets that lack diversity. By using synthetic data-generated digital humans, however, it is possible to create training datasets with inclusivity in mind. For example, synthetic data can be used to create digital humans with a range of skin tones, ages, facial features, and body types, ensuring that they are representative of the wider population. Moreover, synthetic data can also be used to create digital humans that accurately reflect the needs and experiences of underrepresented groups. For instance, digital humans with disabilities or other specialised needs. This can help make computer vision systems more inclusive and accessible from the get-go.

By manipulating the appearance and behaviour parameters of digital humans, a virtually infinite amount of representative training data can be generated, expanding the dataset and reducing the risk of overfitting to specific characteristics or styles.

These models, providing a range of diverse simulations for testing AI models they can allow for a more comprehensive evaluation of models. Highly accessible, versatile and flexible, digital humans have the potential to radically impact how AI vision systems are trained.

Diverse datasets play a crucial role in prioritising inclusivity, improving fairness, and enhancing the overall performance and reliability of AI vision systems. Many predict that digital humans will assume a growing role in AI training to combat the existing drawbacks of real world datasets and consequent bias or privacy risks. With a wealth of data at your fingertips, there is simply no excuse to not apply these practices to build scalable, privacy-compliant and inclusive digital humans that can accurately reflect all users.