Google’s “AutoFlip” Is Designed to Crop Videos Intelligently

Traditionally, people used TVs that have a 16:9 or a 4:3 aspect ratio to watch videos. However, with recent devices, people view and create videos in an array of aspect ratios. Cropping videos to fit the screens of these devices is a tedious task for video curators. Thankfully, Google is on the case to crop videos smoothly.

Recently in a blog post, Google announced anopen-sourcetool for reframing and cropping videos to fit any screen. AutoFlip is the tool thatuses machine learning(ML) based object detection and tracking technology to reframe videos automatically.

AutoFlip – For Intelligent Video Cropping

AutoFlip – For Intelligent Video Cropping

Google created this toolto get rid of the conventional static cropping methodfor cropping videos. The static cropping method involves unreliable techniques of video reframing, i.e., specifying a camera viewport for the video and then cropping everything outside that area. This method produces an undesirable output of the videos.

The Google Autoflip is capable of many advanced features that includeshot detection, video content analysis and lastly, reframing. Let me break each one of these reframing strategies briefly.

Shot (Scene) Detection

A scene or a shot in a video is a continuous sequence of frames without any cuts. If there is any change in the shot or scene of a video,Google’s AutoFlip can detect the changeby comparing the colour histogram of the previous frames with the new ones. A shot change is detected when the distribution of frame colour changes at a different rate than a sliding historical window. The tool, to optimise the reframing process, buffers the whole video before making any reframing decisions.

Video Content Analysis

By using this strategy,the tool detects important objects and people in the video. It uses deep learning-based object detection models to identify objects. With this model, the tool can even detect any text overlays or brand logos and other elements like motion or ball for sports videos. The face and object detection models are integrated into the tool through MediaPipe. It is basically a framework for processing multimodal data by developing pipelines. This framework usesGoogle’s TensorFlowLite ML framework on CPUs.

Reframing

After identifying people and objects in videos, the tool makes logical decisions on how to reframe the video. AutoFlip chooses one of the three reframing strategies to crop the content –stationary, panning or tracking. The tool chooses the optimal strategy based on the content of the video. For instance, in stationary mode, the reframed camera viewport remains fixed in a stationary position where most of the important scenes of the video are present. For videos that contain motion, it uses Panning by moving the reframed camera viewport at a constant velocity. When there are interesting subjects in the frame, the Tracking mode comes into effect.

Based on the reframing strategy chosen by the algorithm, an optimised cropping window for each frame is set by AutoFlip. This preserves the important content of the video in the best possible way.

Google released this tool directly to the developers and film-makers aiming to “reduce the barriers to their design creativity and reach through the automation of video editing“. From landscape to portrait or portrait to landscape, whatever the case, AutoFlipis designed to deliver the best possible result.

Beebom Staff

Bringing the latest in technology, gaming, and entertainment is our superhero team of staff writers. They have a keen eye for latest stories, happenings, and even memes for tech enthusiasts.

Add new comment

Name

Email ID

Δ

01

02

03

04

05