Google rolls out tool for publishers to opt out of AI data training, but not search

After much controversy was generated over the issue of Google, and other tech titans, using other publishers' data to train their AI tools, a solution has finally been rolled out by Google.

| Updated on: Sep 29 2023, 12:27 IST
Google's new tool, Google-Extended, allows publishers to opt out of AI training data, providing greater control over content use. (AFP)
Google's new tool, Google-Extended, allows publishers to opt out of AI training data, providing greater control over content use. (AFP)

Google has unveiled a new feature, Google-Extended, offering website publishers the ability to exclude their data from contributing to the development of Google's AI models. While websites will still remain accessible through Google Search, this tool provides publishers with greater control over the use of their content for AI training purposes. In effect, Google will stop using the data of those publishers who opt out.

Managing AI Contribution

This move by Google addresses concerns among web publishers who wish to protect their data from being utilised in AI model training. Google-Extended enables publishers to manage the involvement of their websites in enhancing AI generative APIs like Bard and Vertex AI. Publishers can now exercise precise control over content access on their sites, preserving their data privacy rights, the Verge reported.

Balancing Visibility and Data Protection

Earlier this year, Google confirmed that it was training its AI chatbot, Bard, using publicly available data scraped from the web. This announcement sparked concerns and prompted publishers to seek ways to shield their content from being used for AI training purposes, much like the approach taken by major news outlets such as the New York Times, CNN, Reuters, and Medium.

Unlike other web crawlers, Google's indexing is integral to a website's discoverability in search results. Therefore, completely blocking Google's crawlers could have adverse effects on a website's online presence. To address this challenge, some publishers have resorted to legal measures, such as updating their terms of service to prohibit companies from leveraging their content for AI training.

Google-Extended is made accessible through robots.txt, a file that instructs web crawlers on which parts of a site they can access. As AI applications continue to expand, Google is committed to exploring additional machine-readable options that offer more choice and control to web publishers. Further developments in this regard are expected to be shared in the near future.

In short, Google's introduction of Google-Extended provides publishers with a valuable tool to safeguard their data from contributing to AI model training while still benefiting from Google Search's indexing capabilities. This development marks a significant step toward addressing concerns regarding the use of web content for AI training and ensuring greater transparency and control for publishers.

One more thing! We are now on WhatsApp Channels! Follow us there so you never miss any update from the world of technology. ‎To follow the HT Tech channel on WhatsApp, click here to join now!

Follow HT Tech for the latest tech news and reviews , also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 29 Sep, 11:39 IST