Mark Zuckerberg’s Secret Weapon for AI Is Your Facebook Data

Mark Zuckerberg wants to take advantage of data on Facebook and Instagram to create powerful, general-purpose AI.

By:BLOOMBERG
| Updated on: Feb 07 2024, 07:19 IST

Mark Zuckerberg plans to use data from Facebook and Instagram to create powerful AI. (AFP )

For many people, Facebook is the internet, and the number of its users is still growing, according to Meta Platforms Inc.'s latest financial results. But Mark Zuckerberg isn't just celebrating that continuing growth. He wants to take advantage of it by using data from Facebook and Instagram to create powerful, general-purpose artificial intelligence. Sounds great and Meta is well positioned to do it, but his billions of users may end up paying the price with their privacy and more.

Here's how Zuckerberg teased his next move in AI on Thursday:

“The next key part of our playbook is learning from unique data and feedback loops in our products… On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”

You may be interested in

MobilesTablets Laptops

7% OFF

23% OFF

We are on WhatsApp Channels. Click to join.

Also read

Looking for a smartphone? To check mobile finder click here.

The point that Zuck makes here about “Common Crawl” startled observers in the tech press, because that archive is already huge: 250 billion web pages spanning 17 years. It's one of the biggest and most popular repositories of the public internet used for training AI systems today. When OpenAI launched its GPT-3 language model in 2020, close to 60% of the text used to train the system came from Common Crawl.

But Meta's data mountain is even bigger, which means it could theoretically build “smarter” AI. That's because research has shown that training AI models on more data tends to make them more capable and accurate. That formula has worked wonders for OpenAI, which over the years has increased the amount of data used to create models like ChatGPT.

If Zuckerberg wants to make a more powerful chatbot, the pile of data he's sitting on is especially valuable because so much of it comes from comment threads. Any text that represents human dialogue is critical for training so-called conversational agents, which is why OpenAI heavily mined the internet forum Reddit Inc. to build its own popular chatbot.

It's easy to scoff whenever Zuckerberg talks about a new ambition -- whether it's bots or crypto or the metaverse. His latest quixotic vision is especially grand: to build “general intelligence,” or software systems that meet or surpass human intelligence. But with all that data, Zuckerberg's quest looks doable. The problem is what the fallout could be for the rest of us.

It's odd that in the same message where Zuckerberg said that his AI team had been working on building general intelligence “for more than a decade,” he also said that Facebook would only now turn to its user's data to build models as “the next key part of our playbook.” Why hasn't Meta done that already? Perhaps because using all that data isn't so straightforward. For one thing, it would represent yet another infringement on the privacy of Facebook's 3 billion users and Instagram's 1.5 billion users. In the same way OpenAI has come under fire for scraping up the copyrighted data of artists and writers to train its models, Facebook stands to face reputational blowback for exploiting people's data all over again. Not only does that raise thorny ethical questions, doing so could require stringent data handling practices and compliance with global data protection laws, which could raise the hackles of European regulators.

The other issue is all the bias and toxicity in the data. OpenAI had to deal with this issue with Common Crawl, whose vast trove included web pages like adultmovietop100.com and adelaide-femaleescorts.webcam, according to a 2021 study by the University of Montreal. The same study says that between 4% and 6% of all the websites in Common Crawl included racial slurs, hate speech, or racially charged conspiracy theories.

While Facebook's content-moderation software has become better at blocking hate speech and conspiracy theories, it's not perfect and tends to be worse in countries outside the United States. Some of the content on Facebook that gets flagged as toxic doesn't get reviewed by a human anymore and is left on the site. Worse: When Zuckerberg said that Meta's data was bigger than that of Common Crawl, he was likely lumping in the company's historic archive that would include all the hyperbolic political content and fake news that were on the site before Zuckerberg took pains to clean it up.

All the work that must go into careful data handling and checking might explain why Zuckerberg has only now talked about capitalizing on the data mountain that he sits on. If he doesn't do it properly, he risks reliving the nightmare of public criticism about how Facebook has handled fake news and harmful content.

Still, if there's one thing we know about Zuckerberg, it's that he has a Caesar-like obsession with winning and domination. Last week, about 24 hours after he faced a crowd of parents in Washington, D.C. who accused him of leading their children to self-harm or even suicide, he went on to announce one of Meta's most successful financial quarters yet and tease how he'd use people's data to create powerful AI.

The proximity of those events should serve as a reminder: Facebook's path to riches has hurt many. So too might its road to building powerful AI.

Also, read these top stories today:

Deepfakes On The Rise! A finance worker in Hong Kong transferred more than $25 million to scammers after they posed as his colleagues on a video call, marking perhaps the biggest known corporate fraud using deepfake technology to date. Know how to spot deepfakes here.

Meta Urged to Rethink Its Policy! Facebook owner Meta's policies on manipulated media have been criticized as 'incoherent' and insufficient by an oversight board. Dive in here.

Tech Layoffs Continue! The tech industry has started 2024 with another wave of job cuts, paring back even further after widespread layoffs last year. So far, some 32,000 tech workers have lost their jobs in 2024. Find out more here.

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 07 Feb, 07:19 IST

Tags: artificial intelligence mark zuckerberg

NEXT ARTICLE BEGINS

Best Deals For You

Air purifiers to buy in India for healthy and clean air- Here are top 5 picks

Trending Gadgets

Mobiles Laptops Tablets

Mark Zuckerberg’s Secret Weapon for AI Is Your Facebook Data

Mark Zuckerberg wants to take advantage of data on Facebook and Instagram to create powerful, general-purpose AI.

Here's how Zuckerberg teased his next move in AI on Thursday:

You may be interested in

Also, read these top stories today:

Tips & Tricks

iPhone 16 series, OnePlus 13, and other 5 flagship smartphones to launch in 2024

Apple Music can now play ‘same’ playlist on YouTube Music: Here’s how it is possible

iPhone users will be able to transcribe voice recordings with iOS 18: Here is how it works

Protect your Aadhaar Card: How to check, lock, and report misuse effectively online

Wondering if your iPhone has hidden apps? Know how to find and manage them easily

Editor’s Pick

Trending Stories

iPhone SE 4 launch still months away, powerful mid-ranger likely to arrive in…

Apple’s ‘Glowtime’ Event on 9 September: These products, including iPhone SE 4, are not expected to launch

iPhone 16 Pro must improve in these 3 areas—And I say this after using iPhone 15 Pro for almost a year

Aadhaar Card Update for free online: Act before September 14 to avoid future fees

Anil Kapoor featured in TIME's 100 Most Influential People in AI cover, but Sam Altman misses out: Here’s why

Gaming

GTA 6 leaked weather effects leave fans stunned as release date speculation grows amid delay concerns

GTA 6 leak uncovers early Vice City build, showing debug menus, asset tweaks, and variants

GTA 6 release date might have leaked, possibly aligning with a key franchise milestone

Forza Horizon 5 no longer an Xbox exclusive; PS5 launch confirmed: Release window and more

PlayStation Plus February 2025 free games revealed—But there’s sad news for PS4 owners

Best Deals For You

Air purifiers to buy in India for healthy and clean air- Here are top 5 picks

5 best smartphones for your eyes: Xiaomi 13, Honor 90 to Motorola Edge Plus, check list

Top 10 smartwatch brands: Leading the market with innovation

Japanese toilets in India: TOTO washlet starting price, features and all details to know

Amazon Diwali Sale 2024: Get up to 40% off on ASUS Vivobook S 16 OLED to Lenovo Yoga Slim 6 and more laptops

Trending News