Background: With technological advancement, the quality of life of people has improved.
Also, with technological advancement, large amounts of data are produced by people. The
data is in the forms of text, images and videos. Hence, there is a need for significant efforts and
means of devising methodologies for analyzing and summarizing them to manage with the space
constraints. Video summaries can be generated either by keyframes or by skim/shot. The keyframe
extraction is done based on deep learning-based object detection techniques. Various object detection
algorithms have been reviewed for generating and selecting the best possible frames as
keyframes. A set of frames is extracted out of the original video sequence and based on the technique
used, one or more frames of the set are decided as a keyframe, which then becomes the part of
the summarized video. The following paper discusses the selection of various keyframe extraction
techniques in detail.
Methods: The research paper is focused on the summary generation for office surveillance videos.
The major focus of the summary generation is based on various keyframe extraction techniques. For
the same, various training models like Mobilenet, SSD, and YOLO are used. A comparative analysis
of the efficiency for the same showed that YOLO gives better performance as compared to the
other models. Keyframe selection techniques like sufficient content change, maximum frame coverage,
minimum correlation, curve simplification, and clustering based on human presence in the
frame have been implemented.
Results: Variable and fixed-length video summaries were generated and analyzed for each keyframe
selection technique for office surveillance videos. The analysis shows that the output video obtained
after using the Clustering and the Curve Simplification approaches is compressed to half the size of
the actual video but requires considerably less storage space. The technique depending on the
change of frame content between consecutive frames for keyframe selection produces the best output
for office surveillance videos.
Conclusion: In this paper, we discussed the process of generating a synopsis of a video to highlight
the important portions and discard the trivial and redundant parts. Firstly, we have described various
object detection algorithms like YOLO and SSD, used in conjunction with neural networks like
MobileNet, to obtain the probabilistic score of an object that is present in the video. These algorithms
generate the probability of a person being a part of the image for every frame in the input
video. The results of object detection are passed to keyframe extraction algorithms to obtain the
summarized video. Our comparative analysis for keyframe selection techniques for office videos
will help in determining which keyframe selection technique is preferable.