OpenAI’s offer to stop web crawler comes too late | Tech News

OpenAI’s offer to stop web crawler comes too late

Amazon, Airbnb, Ikea and other sites have blocked ChatGPT from scraping their content, but there are plenty of other bots out there.

By: HT TECH
| Updated on: Aug 26 2023, 07:11 IST
Microsoft brings Windows Copilot Preview: Know all the AI features in Windows 11 and how to use them
OpenAI
1/5 Back in May, at the Microsoft Build conference, the company teased its Copilot for Windows 11, and now, it's finally here! So, what's new in Build 22631.2129 and how can you get started? Let's check it out here. (Pexels)
image caption
2/5 Preview Launch: Windows Copilot is now in preview mode, available to Windows Insiders in the Beta Channel through a controlled feature rollout. This is not available for general users at this point. (Microsoft)
image caption
3/5 To get started, just click on the new button on the taskbar (or WIN + C) to launch Windows Copilot. Windows Copilot will use the same Microsoft account (MSA) or Azure Active Directory (AAD) account used to sign-in to Windows. (Pexels)
image caption
4/5 Windows Copilot will appear as a side bar docked to the right where it won’t overlap with your desktop content and will run unobstructed alongside your open app windows, allowing you to interact with Windows Copilot anytime you need.  (Microsoft)
image caption
5/5 Ask questions: In this first preview, you can ask Windows Copilot a range of questions or to take actions such as - Change to dark mode, Turn on do not disturb, Take a screenshot, Summarize this website” (Active tab in Microsoft Edge), Write a story about a dog who lives on the moon, Make me a picture of a serene koi fishpond with lily pads, and more.   (Pexels)
OpenAI
icon View all Images
Many websites, including Amazon, Airbnb, and Quora, have blocked OpenAI's web crawler from scraping their content to train its AI chatbot. (AP)

Earlier this month, in response to mounting criticism around how OpenAI scoops up data to train ChatGPT, its groundbreaking chatbot, the company made it possible for websites to block it from scraping their content. A short piece of code would tell OpenAI to go away (and it would kindly obey).

Since then, hundreds of sites have shut the door. A Google search reveals many of them: Major online properties such as Amazon, Airbnb, Glassdoor and Quora have added the code to their “robots.txt” file, a kind of rules of engagement for the many bots — or spiders as they are also known — that scour the internet.

You may be interested in

MobilesTablets Laptops
7% OFF
Apple iPhone 15 Pro Max
  • Black Titanium
  • 8 GB RAM
  • 256 GB Storage
28% OFF
Samsung Galaxy S23 Ultra 5G
  • Green
  • 12 GB RAM
  • 256 GB Storage
Google Pixel 8 Pro
  • Obsidian
  • 12 GB RAM
  • 128 GB Storage
Apple iPhone 15 Plus
  • Black
  • 6 GB RAM
  • 128 GB Storage

When I got in touch with the companies, none were willing to discuss their reasoning, but it's quite obvious: They want to put a stop to OpenAI taking content that doesn't belong to them in order to train its artificial intelligence. Unfortunately, it's going to take a lot more than a line of code to stop that from happening.

Also read
Looking for a smartphone? To check mobile finder click here.

Other online resources with the kind of data that an AI system would love also have moved to block the crawler: Furniture store Ikea, jobs site Indeed.com, vehicle comparison resource Kelley Blue Book, and BAILII, the UK's court records system, similar to the US's PACER (which doesn't appear to be blocking the bot).

Coding resources website StackOverflow is blocking the crawler, but not its rival GitHub — perhaps unsurprising given that GitHub's owner, Microsoft, is a major investor in OpenAI. And, as major media companies begin negotiating with (or possibly suing) the likes of OpenAI over access to their archives, many have also taken the step to block the bot. Research reported by Business Insider suggested 70 of the top 1,000 websites globally have added the code. We can expect that number to grow.

Problem solved? Not likely. While it's very generous of OpenAI to give sites the ability to prevent its robot from siphoning their content, the gesture rings hollow when you consider that OpenAI's bot has already been out there gathering this data for some time. The AI horse has very much bolted: Adding the code at this stage is like shouting “And don't come back, ya hear!” at a burglar as they disappear into the night with your belongings.

In fact, the move could serve to strengthen OpenAI's early lead. By setting this precedent, it can argue newer competitors should do the same, pulling up the ladder and enjoying the benefits of being one of AI's first movers. “What is certain is that OpenAI isn't giving the data it collected back,” noted tech worker-turned-commentator Ben Thompson in a recent edition of his email newsletter.

Of course, web crawlers are just one way in which OpenAI and other AI companies collect data to be used to train their systems. Recent legal battles between content owners and AI companies have centered on the fact that OpenAI, Meta, Google and others often use bulk datasets provided by third parties, such as “Books3,” a data set containing around 200,000 books, compiled by an independent AI researcher. Several authors are suing over its use.

OpenAI declined to comment, including on the question of whether sites that blocked OpenAI's web crawler could be confident OpenAI wouldn't use their data if sourced via other means. It certainly won't alter what has been scooped up already. We can take only small comfort from the fact OpenAI has acknowledged that consent is a factor in future scraping efforts. There are hundreds of other bots out there, unleashed by AI companies less well-known than OpenAI, that won't provide any kind of option for sites to opt out.

Google, which has built a rival chat tool called Bard, wants to start a discussion on the best mechanism for administering consent on AI. But as the writer Stephen King put it recently, the data is already in the “digital blender” — and there seems to be very little anyone can do about it now.

More From Bloomberg Opinion:

  • Can Oxford and Cambridge Save Harvard From ChatGPT? Adrian Wooldridge
  • There's Too Much Money Going to AI Doomers: Parmy Olson
  • Secretive Chatbot Developers Are Making a Big Mistake: Dave Lee

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

Dave Lee is Bloomberg Opinion's US technology columnist. Previously, he was a San Francisco-based correspondent at the Financial Times and BBC News.

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 26 Aug, 07:11 IST
NEXT ARTICLE BEGINS