vAIsual Launches First Asian Diaspora Non-Biometric Dataset

vAIsual Inc, the company behind the largest visual dataset collection in the world, today launched the first of its Asian diaspora non-biometric datasets, consisting of thousands of Asian people and scenes.

The Asian People in Context dataset, with over 20,000 images, will play a crucial role in training AI models to recognize, classify, and analyze Asian scenes and characters. The resulting trained models can contribute to applications for environment detection, generative AI and human identification.

The dataset is the first of many delivered through a partnership with Vietnam-based stock agency Dragonimages.

Machine learning researchers and data scientists can use the datasets for a variety of purposes. All the images are legally cleared with model releases and trademark compliance. The datasets are available for non-European customers only, due to non-GDPR compliant model releases.

The images feature mostly Asian people, of various ages and genders, in a range of contexts, including streets, cafes, workplaces and retail settings.

The datasets are specially prepared to meet the needs of ML teams, such as detailed and consistent metatags, high-resolution images, and, most importantly, legal clearances.

Self-service access to the datasets is via the Dataset Shop, established in 2022 by clean data specialists vAIsual Inc, and specifically catering to research and engineering teams training AI for a range of applications.

According to vAIsual CEO, Michael Osterrieder, diversity is king in AI training and our customers have been anticipating access to datasets with Asian identities.

"We are excited to launch this first Asian People in Context dataset that focus on the Asian diaspora. Using our proprietary dataset-building technology, we can now assemble datasets consisting of tens of thousands of images of a particular theme or subject.

Being able to collate and package these datasets saves hundreds of hours for engineers to prepare material for AI training." says Osterrieder.

While reducing time is a core benefit, Osterrieder also emphasizes the importance of having full legal clearance.

"We are starting to see dataset disclosure requirements emerging in some jurisdictions, which will mean any AI model trained on scraped data will risk being blocked," says Osterrieder.

The availability of legally clean datasets, that also remunerate the original content creators, is an important step to ensure companies building AI technology are doing it ethically and responsibly.

"Offering custom-prepared datasets containing premium visual content, with the consent of the original copyright owners (or their legal representatives). is essential for the AI industry to mature into a truly commercial and viable industry," says Osterrieder.

In the coming weeks, additional datasets will be added to the datasetshop.com. The datasets are specially prepared for engineers to add to their workflow for AI training and are commercially available in a variety of resolutions.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.