To facilitate research on the OuluVS2 database, we have pre-processed the raw video data to extract the regions of interest (ROIs) that contain the talking mouth. The figure on the right shows some examples of the extracted ROI images.

The preprocessing involves segmenting long videos into short ones that include individual utterances, locating facial landmarks, aligning video frames to remove irrelevant head motion, correcting head poses and finally cropping off ROIs. So far, we have done segmentation for all the HD data and extracted ROIs from segmented HD videos that record digit strings and short phrases.

The table below lists the sizes of pre-processed data.

Contents Size
Full-face HD videos >49GB
ROI videos (digit strings) <1GB
ROI videos (phrases) <150MB
Audio (WAV) <2G


Through the links below, you can view the ROI videos extracted from the sample videos provided on the Database Details page.