Learning from videos to understand the world

Today, we’re announcing a project called Learning from Videos, designed to automatically learn audio, textual, and visual representations from the data in publicly available videos uploaded to Facebook.

By learning from videos spanning nearly every country and hundreds of languages, this project will not just help us continuously improve our core AI systems for applications like content recommendation and policy enforcement — it will enable entirely new experiences.

This is also part of our broader efforts toward building machines that learn like humans do — from any example, not just ones where experts have labeled.

The first application is now live in Instagram Reels’ recommendation system.

Continuously learning from the world around us is one of the hallmarks of human intelligence. Just as we quickly learn to recognize people, places, things, and actions through observation, AI systems will be smarter and more useful if they can mimic the way humans learn. In just the last couple of years, we’ve made substantial breakthroughs in self-supervised learning across
, and
. These advancements have made AI systems less dependent on labeled data sets — a fundamental bottleneck on the pace of AI innovation — so that AI can start understanding the world through vast amounts of observational data like humans do.

Every day, people around the globe share videos on Facebook products. Building AI that learns from publicly available videos will help us create machines that better analyze uncurated, real-world sights and sounds — not just examples that are part of a much smaller, hand-curated data set.

Today, we are announcing a new project called Learning from Videos, designed to learn from audio, visual, and textual input — to continuously improve our core systems and power entirely new applications. By learning from global streams of publicly available videos spanning nearly every country and hundreds of languages, our AI systems will not just improve accuracy but also adapt to our fast-moving world and recognize the nuances and visual cues across different cultures and regions. And by helping AI researchers break away from the reliance on labeled data, we can improve AI-powered products and create entirely new experiences.

AI models are successful when we meet and acknowledge the responsibility we have to honor people’s privacy. We’re building and maintaining a strong privacy foundation that uses automated solutions to enforce privacy at scale. By embedding this work at the infrastructure level, we can consistently apply privacy requirements across our systems and support efforts like AI. This includes implementing technical safeguards throughout the data lifecycle.

Although we’ve just scratched the surface, using semi- and self-supervised learning on the videos uploaded to Facebook has already improved our computer vision and speech recognition systems. Within six months of developing a
state-of-the-art, self-supervised framework
for video understanding, we’ve built and deployed an AI model in Instagram Reels’ recommendation system. And this is just the beginning of our Learning from Videos project. Early experiments in applying self-supervised learning to real-world videos also show a 20 percent reduction in speech recognition errors, which could improve a wide range of applications like auto-captioning and tasks that help flag
harmful content like hate speech
. And we’re researching ways to apply new capabilities, like multimodal video retrieval, in order to make it easier for people to surface key moments in time from their trove of digital memories.


Leave a reply:

Your email address will not be published.

Sliding Sidebar