๐งฉ Background
For many young children, speech correction is not always achievable through traditional educational methods alone. Recently, AI-powered pronunciation correction services have emerged as a promising alternative—powered by lip-reading AI technology that analyzes mouth movements and voice data.
To accelerate the commercialization of this technology, Gendive successfully executed a project to build a high-quality lip-reading speech dataset for children.
๐ Project Overview:
Collecting Child Lip-Reading Video and Voice Data

The project focused on approximately 200 children aged 6 to 12, with the following data collection and processing requirements from the client:
Scripted speech prompts and video recordings (mp4) of children’s pronunciations
Multi-angle lip-reading video capture
Transcription and text normalization of speech content
Audio extraction (wav format) and alignment with video data
Final deliverable in JSON format
โ The Challenge: Handling Sensitive Data with Compliance

Collecting data from young children posed multiple complexities:
Legal compliance: Collection required parental consent forms, portrait rights agreements, etc.
Participation rates: Encouraging engagement from children and guardians was challenging
Multi-angle video recording: Required specific equipment and filming conditions
High-quality refinement: Needed denoising, precise transcription, and labeling
๐ก Gendive’s Solution

1. Optimized Field Setup for High Participation
To increase engagement, Gendive created a friendly and comfortable filming environment. Sentences were extracted from popular children's books to ensure familiarity and interest.
โ
Result: Over 900 mp4 video recordings were collected.
2. Accurate Transcription and Metadata Tagging
Approximately 2,000 sentences were transcribed from the recordings, each averaging 11 sentences per child.
Noise and mispronunciations were removed, and metadata such as age, gender, and speech context were tagged to enhance dataset usability.
3. Audio-Video Synchronization at Frame-Level
Using our in-house annotation tools, we synchronized mp4 video and wav audio data at the frame level.
Final output: a structured dataset in AI-ready JSON format.
๐ Project Outcomes: A Model Case in Ethical, High-Quality AI Data

This project demonstrated Gendive’s strength in building specialized AI datasets under complex conditions:
A rare dataset of children’s lip-reading video and speech data
Set a standard for ethically collected, consent-based data acquisition
Provided multi-angle synced video/audio data
Applied Gendive's AI Data Quality Management Guidelines v3.1 for model-optimized refinement
๐ค Why Gendive?
Proven experience across diverse AI data projects
Full-cycle data lifecycle management and quality control
Practical expertise in collecting data from sensitive or hard-to-reach populations
Custom delivery format and structure aligned to client needs
๐ Planning to Build AI with Specialized Data?
At Gendive, we specialize in data collection and annotation for hard-to-access demographics such as children, seniors, and individuals with disabilities.
๐ฉ Let’s talk about how we can support your next AI initiative.
๐ Share this case study with your network to spread the word!
๐ Build your AI foundation with Gendive.
๐งฉ Background
For many young children, speech correction is not always achievable through traditional educational methods alone. Recently, AI-powered pronunciation correction services have emerged as a promising alternative—powered by lip-reading AI technology that analyzes mouth movements and voice data.
To accelerate the commercialization of this technology, Gendive successfully executed a project to build a high-quality lip-reading speech dataset for children.
๐ Project Overview:
Collecting Child Lip-Reading Video and Voice Data
The project focused on approximately 200 children aged 6 to 12, with the following data collection and processing requirements from the client:
Scripted speech prompts and video recordings (mp4) of children’s pronunciations
Multi-angle lip-reading video capture
Transcription and text normalization of speech content
Audio extraction (wav format) and alignment with video data
Final deliverable in JSON format
โ The Challenge: Handling Sensitive Data with Compliance
Collecting data from young children posed multiple complexities:
Legal compliance: Collection required parental consent forms, portrait rights agreements, etc.
Participation rates: Encouraging engagement from children and guardians was challenging
Multi-angle video recording: Required specific equipment and filming conditions
High-quality refinement: Needed denoising, precise transcription, and labeling
๐ก Gendive’s Solution
1. Optimized Field Setup for High Participation
To increase engagement, Gendive created a friendly and comfortable filming environment. Sentences were extracted from popular children's books to ensure familiarity and interest.
โ Result: Over 900 mp4 video recordings were collected.
2. Accurate Transcription and Metadata Tagging
Approximately 2,000 sentences were transcribed from the recordings, each averaging 11 sentences per child.
Noise and mispronunciations were removed, and metadata such as age, gender, and speech context were tagged to enhance dataset usability.
3. Audio-Video Synchronization at Frame-Level
Using our in-house annotation tools, we synchronized mp4 video and wav audio data at the frame level.
Final output: a structured dataset in AI-ready JSON format.
๐ Project Outcomes: A Model Case in Ethical, High-Quality AI Data
This project demonstrated Gendive’s strength in building specialized AI datasets under complex conditions:
A rare dataset of children’s lip-reading video and speech data
Set a standard for ethically collected, consent-based data acquisition
Provided multi-angle synced video/audio data
Applied Gendive's AI Data Quality Management Guidelines v3.1 for model-optimized refinement
๐ค Why Gendive?
Proven experience across diverse AI data projects
Full-cycle data lifecycle management and quality control
Practical expertise in collecting data from sensitive or hard-to-reach populations
Custom delivery format and structure aligned to client needs
๐ Planning to Build AI with Specialized Data?
At Gendive, we specialize in data collection and annotation for hard-to-access demographics such as children, seniors, and individuals with disabilities.
๐ฉ Let’s talk about how we can support your next AI initiative.
๐ Share this case study with your network to spread the word!
๐ Build your AI foundation with Gendive.