Video Shot and Scene Segmentation

May 05, 2014
tool
11 min read

Description

This is a tool for automatic temporal segmentation of videos into shots and scenes. Shots are sequences of consecutive frames captured without interruption by a single camera. The transition between two successive shots of the video can be abrupt, where one frame belongs to a shot and the following frame belongs to the next shot, or gradual, where, two shots are combined using chromatic, spatial or spatial-chromatic video production effects (e.g., fade in/out, dissolve, wipe), which gradually replace one shot by another. This tool is capable of detecting both types of transitions. Scenes are higher level temporal fragments that correspond to the story-telling parts of the video. They are formed by grouping the detected shots into semantically coherent temporal video fragments. Parallel computing, either CPU-based (via multi-threading) or GPU-based (via CUDA programming), is used, depending on the version of the tool, for making the whole video analysis several times faster than real-time processing.

Usage

Download the appropriate tar.gz from the list below.
Unpack the tar.gz archive creating a new folder.
Run the software following the instructions in the included "documentation.pdf" file.

Output

After the processing is finished, the folder where the analyzed video is placed, will contain four newly created results files:

The file "<video_name>_shots.txt" contains information about the automatically detected shots (following lines of the document). For each shot the software stores a new line with information about the starting and ending frames of the shot, plus three intermediate frames that can be used as representative keyframes of this shot in other visual analysis tasks (e.g., for video concept detection and video event detection)
The file "<video_name>_shots.srt" contains the automatically detected shots in the format of video's subtitles, enabling the user to check visually the produced output (after renaming the file as the name of the video, and using e.g. the VLC player, or other players)
The file "<video_name>_scenes.txt" contains information about the automatically detected scenes. For each scene the software stores a new line with seven (7) numbers. The first two frames represent the starting and ending frames of the shot, while the next five numbers are the five most representative keyframes of the scene. If the last two numbers of a scene are zero this means that the scene consists of only one shot, thus we list only three keyframes.
The file "<video_name>_scenes.srt" contains the automatically detected scenes in the format of video's subtitles, enabling the user to check visually the produced output (after renaming the file as the name of the video, and using e.g. the VLC player, or other players)

Prerequisites (only for the GPU-based versions of the tool)

NVIDIA graphics card that supports CUDA technology with the latest drivers installed
CUDA Toolkit version 5.5 installed (https://developer.nvidia.com/cuda-toolkit-55-archive)

Limitations

This free version of the software can process videos of duration up to 10 minutes each. For obtaining an unrestricted version, please contact us at: shotsegmentation@iti.gr

The software was tested, in terms of compatibility, using various video formats and codecs, including MPEG, MP4, AVI, WMV, MOV, WEBM, FLV, and to our knowledge all other video formats and codecs supported by FFmpeg can also be processed. However, if a problem with another type of video occurs, please contact us at: shotsegmentation@iti.gr

Version history

Version 1.0 (Released 02/05/2014)

This release is based on the OpenCV library version 2.4.7 and the CUDA Toolkit version 5.5. It performs only segmentation to shots. The ORB descriptor and HSV histograms are used for the representation of the visual content. The detection of shot transitions is performed by assessing the visual similarity between successive on non-successive video frames. False alarms are filtered out by exploiting motion information and a flashlight detector. Version 1.0 has been replaced by the newer version 1.1.

Version 1.1 (Released 30/05/2014)

This release is based on the OpenCV library version 2.4.7 and the CUDA Toolkit version 5.5. It performs only segmentation to shots. It offers improved performance regarding the detection of gradual transitions, substituting the use of motion information by two gradual transition detectors. The first one detects wipe transitions. The second one detects dissolve and fade in/out transitions. Version 1.1 has been replaced by the newer version 1.2.

Version 1.2 (Released 24/06/2014)

This release is based on the OpenCV library version 2.4.7 and the CUDA Toolkit version 5.5. It extends the previous version by integrating algorithms for video scene segmentation and keyframe selection (up to 5 most representative keyframes of each scene are identified), on top of the shot segmentation. The new additions do not affect the processing speed, which remains at least 2 times faster than real-time processing (depending on the processing capability of the graphics card) for the entire video analysis chain (i.e., shot segmentation, scene segmentation and keyframe selection). Version 1.2 has been replaced by the newer version 1.3.

Version 1.3 (Released 19/08/2014)

This release is based on the OpenCV library version 2.4.7. It uses exclusively CPU-based processing, while the analysis is accelerated by invoking multi-threading / multi-processing operations of the Intel OpenMP runtime library. This release is about 7 times faster than real-time processing, on an Intel i7 PC.

Version 1.4 (Released 09/09/2014)

This release is the same as version 1.3, but shipped with runtime libraries of smaller size, resulting in a much more compact software package.

Version 1.4.1 (Released 26/11/2014)

This release is an update of version 1.4, after fixing a number of bugs.

Version 1.4.2 (Released 29/07/2015)

This release is an update of version 1.4.1, after utilizing a newer version of the OpenCV library (ver. 2.4.9), shipping it with a much smaller number of runtime libraries and fixing a few bugs.

Version 1.4.3 (Released 05/08/2016)

This release is a compatibility update of version 1.4.2 in order to run on Win10 and Ubuntu 14 and 16 releases, while a few minor bug fixes have been made.

Version 1.4.4 (Released 10/04/2017)

This release introduces four optional arguments in the software's argument list, that enable i) to run shot segmentation only, instead of both shot and scene segmentation, ii) to include in the output results files information about the type of each detected shot transition (abrupt, dissolve, wipe), iii) to choose the output path of the extracted keyframes, and iv) to choose the number of extracted keyframes per shot.

Version 1.4.5 (Released 27/04/2018)

This release is an update of version 1.4.4 with some bug fixes

Note: in recent tests on a PC having Windows 7, Intel Core i7-2600K @ 3.40 GHz, 8GB RAM, and an NVIDIA GeForce GTX 560, our latest CPU version (v1.4.x found under "Latest Edition" bellow) took about 13,5% of the run-time duration of the video to complete its processing (i.e., for a 10-min 480x360 video the processing took about 1 min 20 sec), with the GPU version (found under "Other compatibility options" bellow) took about 32,5%. It is therefore suggested that you use the CPU version bellow (v1.4.x), or you may also want to try the GPU version if your GPU hardware is significantly better than that of our test PC.

Downloads

Latest Edition (v.1.4.5)

Windows 64-bit v1.4.5.zip (CPU-version 1.4.5) / Compatible with: 64-bit installations of Windows (XP, Vista, Win7, Win8/8.1, Win10)
Ubuntu 16.04 64-bit v1.4.5.tar.gz (CPU-version 1.4.5) / Compatible with: 64-bit installations of Ubuntu 16.04

Previous Editions

Windows 64-bit v1.4.4.tar.gz (CPU-version 1.4.4) / Compatible with: 64-bit installations of Windows (XP, Vista, Win7, Win8/8.1, Win10)
Ubuntu 16.04 64-bit v1.4.4.tar.gz (CPU-version 1.4.4) / Compatible with: 64-bit installations of Ubuntu 16.04

Windows 32-bit v1.4.3.tar.gz (CPU-version 1.4.3) / Compatible with: 32-bit installations of Windows (XP, Vista, Win7, Win8/8.1, Win10)
Ubuntu 14.04 64-bit v1.4.3.tar.gz (CPU-version 1.4.3) / Compatible with: 64-bit installations of Ubuntu 14.04

Other compatibility options

Windows 64-bit v1.2.tar.gz (GPU-version 1.2) / Compatible with: Windows 7, Windows 8, and some Windows XP and Windows Vista installations with multi-threading support (Intel TBB library)
Windows 32-bit tbb v1.2.tar.gz (GPU-version 1.2) / Compatible with: Windows 7, Windows 8, and some Windows XP and Windows Vista installations with multi-threading support (Intel TBB library)
Windows 32-bit NOtbb v1.2.tar.gz (GPU-version 1.2) / Compatible with: Windows XP and Windows Vista installations without multi-threading support

Publication

This software is based exclusively on free-to-use and non-patented modules of the OpenCV library, and implements variations of the shot segmentation and scene segmentation algorithms introduced in the following publications [1] and [2] (please cite these papers if you use the provided software):

[1] E. Apostolidis, V. Mezaris, "Fast Shot Segmentation Combining Global and Local Visual Descriptors", Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014.

[2] P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, I. Trancoso, "Temporal video segmentation to scenes using high-level audiovisual features", IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 8, pp. 1163-1177, August 2011.

License

Redistribution and use in binary form, without modification, is permitted provided that the following conditions are met:

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided by the authors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the authors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

The implementation of certain components used by this software was based on the OpenCV library. Below is the original copyright.

License Agreement

For Open Source Computer Vision Library

Third party copyrights are property of their respective owners.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

* The name of the copyright holders may not be used to endorse or promote product derived from this software without specific prior written permission.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the Intel Corporation or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

The implementation of certain components used by some versions of this software was based on the Intel OpenMP runtime library. Below is the original copyright.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright holder or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

Acknowledgements

This work was supported in part by the European Commission under contracts FP7-287911 LinkedTV and FP7-318101 MediaMixer.

Contacts

You may contact the software developing team (Vasileios Mezaris, Evlampios Apostolidis and Alexandros Pournaras) by sending e-mail at shotsegmentation@iti.gr for any question or remark you may have with respect to this tool.

MKLab

Video Shot and Scene Segmentation