OpenAI’s offer to stop web crawler comes too late

Amazon, Airbnb, Ikea and other sites have blocked ChatGPT from scraping their content, but there are plenty of other bots out there.

By: HT TECH
| Updated on: Aug 26 2023, 07:11 IST

Microsoft brings Windows Copilot Preview: Know all the AI features in Windows 11 and how to use them

OpenAI — 1/5 Back in May, at the Microsoft Build conference, the company teased its Copilot for Windows 11, and now, it's finally here! So, what's new in Build 22631.2129 and how can you get started? Let's check it out here. (Pexels)

image caption — 2/5 Preview Launch: Windows Copilot is now in preview mode, available to Windows Insiders in the Beta Channel through a controlled feature rollout. This is not available for general users at this point. (Microsoft)

Earlier this month, in response to mounting criticism around how OpenAI scoops up data to train ChatGPT, its groundbreaking chatbot, the company made it possible for websites to block it from scraping their content. A short piece of code would tell OpenAI to go away (and it would kindly obey).

Since then, hundreds of sites have shut the door. A Google search reveals many of them: Major online properties such as Amazon, Airbnb, Glassdoor and Quora have added the code to their “robots.txt” file, a kind of rules of engagement for the many bots — or spiders as they are also known — that scour the internet.

You may be interested in

MobilesTablets Laptops

7% OFF

28% OFF

When I got in touch with the companies, none were willing to discuss their reasoning, but it's quite obvious: They want to put a stop to OpenAI taking content that doesn't belong to them in order to train its artificial intelligence. Unfortunately, it's going to take a lot more than a line of code to stop that from happening.

Also read

Looking for a smartphone? To check mobile finder click here.

Other online resources with the kind of data that an AI system would love also have moved to block the crawler: Furniture store Ikea, jobs site Indeed.com, vehicle comparison resource Kelley Blue Book, and BAILII, the UK's court records system, similar to the US's PACER (which doesn't appear to be blocking the bot).

Coding resources website StackOverflow is blocking the crawler, but not its rival GitHub — perhaps unsurprising given that GitHub's owner, Microsoft, is a major investor in OpenAI. And, as major media companies begin negotiating with (or possibly suing) the likes of OpenAI over access to their archives, many have also taken the step to block the bot. Research reported by Business Insider suggested 70 of the top 1,000 websites globally have added the code. We can expect that number to grow.

Problem solved? Not likely. While it's very generous of OpenAI to give sites the ability to prevent its robot from siphoning their content, the gesture rings hollow when you consider that OpenAI's bot has already been out there gathering this data for some time. The AI horse has very much bolted: Adding the code at this stage is like shouting “And don't come back, ya hear!” at a burglar as they disappear into the night with your belongings.

In fact, the move could serve to strengthen OpenAI's early lead. By setting this precedent, it can argue newer competitors should do the same, pulling up the ladder and enjoying the benefits of being one of AI's first movers. “What is certain is that OpenAI isn't giving the data it collected back,” noted tech worker-turned-commentator Ben Thompson in a recent edition of his email newsletter.

Of course, web crawlers are just one way in which OpenAI and other AI companies collect data to be used to train their systems. Recent legal battles between content owners and AI companies have centered on the fact that OpenAI, Meta, Google and others often use bulk datasets provided by third parties, such as “Books3,” a data set containing around 200,000 books, compiled by an independent AI researcher. Several authors are suing over its use.

OpenAI declined to comment, including on the question of whether sites that blocked OpenAI's web crawler could be confident OpenAI wouldn't use their data if sourced via other means. It certainly won't alter what has been scooped up already. We can take only small comfort from the fact OpenAI has acknowledged that consent is a factor in future scraping efforts. There are hundreds of other bots out there, unleashed by AI companies less well-known than OpenAI, that won't provide any kind of option for sites to opt out.

Google, which has built a rival chat tool called Bard, wants to start a discussion on the best mechanism for administering consent on AI. But as the writer Stephen King put it recently, the data is already in the “digital blender” — and there seems to be very little anyone can do about it now.

More From Bloomberg Opinion:

Can Oxford and Cambridge Save Harvard From ChatGPT? Adrian Wooldridge
There's Too Much Money Going to AI Doomers: Parmy Olson
Secretive Chatbot Developers Are Making a Big Mistake: Dave Lee

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

Dave Lee is Bloomberg Opinion's US technology columnist. Previously, he was a San Francisco-based correspondent at the Financial Times and BBC News.

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 26 Aug, 07:11 IST

Tags: artificial intelligence

NEXT ARTICLE BEGINS

Editor’s Pick

Best Deals For You

GTA 6 leak: Manni L. Perez and Dylan Rourke emerge as potential Grand Theft Auto 6 protagonists

iPhone discounts: Best deals on iPhone 13, 14, and 15 [April 2024]

Samsung Galaxy Unpacked event date tipped: From Galaxy Z Fold 6 to Galaxy Ring, know what’s coming

iPad Air 2024 may skip mini-LED display- Here's what to expect from Apple Event on May 7

GTA 5 hidden mysteries: Players discover underwater UFO beneath the Pacific Ocean

Trending Gadgets

Mobiles Laptops Tablets

OpenAI’s offer to stop web crawler comes too late

Amazon, Airbnb, Ikea and other sites have blocked ChatGPT from scraping their content, but there are plenty of other bots out there.

You may be interested in

More From Bloomberg Opinion:

Tips & Tricks

iPhone tips: 6 useful features in iOS 17 to try during your next foreign vacation

Samsung Galaxy M55 5G: 10 things to know about this mid-range smartphone

iPhone 15 hidden features: How to take a passport photo on iPhone- 5 steps

5 films to watch on YouTube that are shot entirely on iPhone 15 Pro Max

GTA Online: From competing in races to having a party, 7 things to do if you are bored

Editor’s Pick

OnePlus Nord CE 4 Review: No nonsense smartphone under ₹25,000

Lok Sabha election 2024: How to find the location of your polling booth online with mobile number

Best Atomberg ceiling fans (2024) for your modern home: BLDC tech, high speed, saves power bills

iPhone 16 vs iPhone 15: Know expected upgrades, specifications and what features to expect from Apple

LG Artcool AC launched: Here are the latest LG air conditioner models in 2024 and all top features explained

Trending Stories

iPhone 16 vs iPhone 15: Know expected upgrades, specifications and what features to expect from Apple

iPhone discounts: Best deals on iPhone 13, 14, and 15 [April 2024]

Samsung Galaxy Unpacked event date tipped: From Galaxy Z Fold 6 to Galaxy Ring, know what’s coming

iPad Air 2024 may skip mini-LED display- Here's what to expect from Apple Event on May 7

GTA 5 hidden mysteries: Players discover underwater UFO beneath the Pacific Ocean

Gaming

GTA 6 confirmed for fall 2025 release despite delay rumours, said Take-Two CEO

YouTuber predicts possible GTA 6 screenshots release, fans speculate on Red Dead Redemption PC news

GTA 5 players unearth hidden mission after 10 years; Rockstar Games teases PC debut for classic game

Nintendo Switch games coming in 2024 that you can't miss- Details

GTA 5, Red Dead Redemption 2 and 6 more best-selling titles released by Rockstar Games

Best Deals For You

GTA 6 leak: Manni L. Perez and Dylan Rourke emerge as potential Grand Theft Auto 6 protagonists

Top 10 smartwatches for kids: Fun and safety in one device

10 best triple camera phones: Capturing life in high definition

Lava Storm 5G Review: Does this stylish and affordable 5G phone pack a powerful punch?

Your old TV just became a powerful PC with computing sticks

Trending News

Trending Gadgets