HT TECH wants to start sending you push notifications. Click allow to subscribe

Does ChatGPT plagiarize beyond 'copy-paste'?

Concerns about plagiarism are raised when language models, presumably including ChatGPT, paraphrase and reuse concepts from training data without citing the original source.

By: ANI
Updated on: Feb 21 2023, 09:43 IST
FILE PHOTO: FILE PHOTO: A response by ChatGPT, an AI chatbot developed by OpenAI, is seen on its website in this illustration picture taken February 9, 2023. REUTERS/Florence Lo/Illustration/File Photo/File Photo (REUTERS)

Concerns about plagiarism are raised when language models, presumably including ChatGPT, paraphrase and reuse concepts from training data without citing the original source.

Before finishing their next assignment with a chatbot, students might want to give it some thought. According to a research team led by Penn University that undertook the first study to specifically look at the topic, language models that generate text in response to user prompts plagiarise content in more ways than one.

You may be interested in

Mobiles Tablets Laptops
28% OFF
Samsung Galaxy S23 Ultra 5G
  • Green
  • 12 GB RAM
  • 256 GB Storage
₹107,999₹149,999
Buy now
Google Pixel 8 Pro
  • Obsidian
  • 12 GB RAM
  • 128 GB Storage
₹106,998
Check details
Vivo X100 Pro 5G
  • Asteroid Black
  • 16 GB RAM
  • 512 GB Storage
₹89,999
Check details
Apple iPhone 15 Plus
  • Black
  • 6 GB RAM
  • 128 GB Storage
₹87,900
Check details
21% OFF
Acer Swift Go SFG14 41 NX KG3SI 002 Laptop
  • Pure Silver
  • 8 GB RAM
  • 512 GB SSD
₹58,990₹74,999
Buy now
41% OFF
Acer Aspire 5 A515 57G Laptop
  • Gray
  • 16 GB RAM
  • 512 GB SSD
₹52,990₹89,999
Buy now
41% OFF
Acer Aspire 3 A315 24 NX KDESI 004 Laptop
  • Silver
  • 8 GB RAM
  • 512 GB SSD
₹34,490₹57,999
Buy now
40% OFF
Asus VivoBook 15 X515JA BQ322WS Laptop
  • Transparent Silver
  • 8 GB RAM
  • 512 GB SSD
₹31,350₹51,990
Buy now
35% OFF
Xiaomi Pad 6
  • Mist Blue
  • 6 GB RAM
  • 128 GB Storage
₹25,999₹39,999
Buy now
55% OFF
Lenovo Tab M10 5G
  • Abyss Blue
  • 6 GB RAM
  • 128 GB Storage
₹20,999₹47,000
Buy now
32% OFF
Realme Pad 2
  • Imagination Grey
  • 6 GB RAM
  • 128 GB Storage
₹19,668₹28,999
Buy now
Honor Pad X9
  • Gray
  • 4 GB RAM
  • 128 GB Storage
₹14,999
Check details

"Plagiarism comes in different flavours," said Dongwon Lee, professor of information sciences and technology at Penn State. "We wanted to see if language models not only copy and paste but resort to more sophisticated forms of plagiarism without realizing it."

Also read: Looking for a smartphone? To check mobile finder click here.

The researchers focused on identifying three forms of plagiarism: verbatim, or directly copying and pasting content; paraphrasing, or rewording and restructuring content without citing the original source; and idea, or using the main idea from a text without proper attribution. They constructed a pipeline for automated plagiarism detection and tested it against OpenAI's GPT-2 because the language model's training data is available online, allowing the researchers to compare generated texts to the 8 million documents used to pre-train GPT-2.

The scientists used 210,000 generated texts to test for plagiarism in pre-trained language models and fine-tuned language models, or models trained further to focus on specific topic areas. In this case, the team fine-tuned three language models to focus on scientific documents, scholarly articles related to COVID-19, and patent claims. They used an open-source search engine to retrieve the top 10 training documents most similar to each generated text and modified an existing text alignment algorithm to better detect instances of verbatim, paraphrase and idea plagiarism.

The team found that the language models committed all three types of plagiarism and that the larger the dataset and parameters used to train the model, the more often plagiarism occurred. They also noted that fine-tuned language models reduced verbatim plagiarism but increased instances of paraphrasing and idea plagiarism. In addition, they identified instances of the language model exposing individuals' private information through all three forms of plagiarism. The researchers will present their findings at the 2023 ACM Web Conference, which takes place from April 30-May 4 in Austin, Texas.

"People pursue large language models because the larger the model gets, generation abilities increase," said lead author Jooyoung Lee, a doctoral student in the College of Information Sciences and Technology at Penn State. "At the same time, they are jeopardizing the originality and creativity of the content within the training corpus. This is an important finding."

The study highlights the need for more research into text generators and the ethical and philosophical questions that they pose, according to the researchers.

"Even though the output may be appealing, and language models may be fun to use and seem productive for certain tasks, it doesn't mean they are practical," said Thai Le, assistant professor of computer and information science at the University of Mississippi who began working on the project as a doctoral candidate at Penn State. "In practice, we need to take care of the ethical and copyright issues that text generators pose."

Though the results of the study only apply to GPT-2, the automatic plagiarism detection process that the researchers established can be applied to newer language models like ChatGPT to determine if and how often these models plagiarize training content. Testing for plagiarism, however, depends on the developers making the training data publicly accessible, said the researchers.

The current study can help AI researchers build more robust, reliable and responsible language models in future, according to the scientists. For now, they urge individuals to exercise caution when using text generators.

"AI researchers and scientists are studying how to make language models better and more robust, meanwhile, many individuals are using language models in their daily lives for various productivity tasks," said Jinghui Chen, assistant professor of information sciences and technology at Penn State. "While leveraging language models as a search engine or a stack overflow to debug code is probably fine, for other purposes, since the language model may produce plagiarized content, it may result in negative consequences for the user."

The plagiarism outcome is not something unexpected, added Dongwon Lee.

"As a stochastic parrot, we taught language models to mimic human writings without teaching them how not to plagiarize properly," he said. "Now, it's time to teach them to write more properly, and we have a long way to go." (ANI)

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on ,Twitter, Facebook, , and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 21 Feb, 09:43 IST

Sale

Mobiles Tablets Laptops
4% OFF
Samsung Galaxy S24 Ultra
  • Titanium Black
  • 12 GB RAM
  • 256 GB Storage
₹129,999₹134,999
Buy now
7% OFF
Apple iPhone 15 Pro Max
  • Black Titanium
  • 8 GB RAM
  • 256 GB Storage
₹148,900₹159,900
Buy now
13% OFF
Xiaomi 14
  • Matte Black
  • 12 GB RAM
  • 512 GB Storage
₹69,999₹79,999
Buy now
11% OFF
Apple iPhone 15 Plus
  • Black
  • 6 GB RAM
  • 128 GB Storage
₹79,800₹89,900
Buy now
57% OFF
Lenovo Tab M10 5G
  • Abyss Blue
  • 6 GB RAM
  • 128 GB Storage
₹19,999₹47,000
Buy now
38% OFF
Realme Pad 2
  • Imagination Grey
  • 6 GB RAM
  • 128 GB Storage
₹17,999₹28,999
Buy now
20% OFF
Samsung Galaxy Tab S9 5G 256GB
  • Graphite
  • 8 GB RAM
  • 256 GB Storage
₹88,400₹110,998
Buy now
6% OFF
Apple iPad Pro 11 2022
  • Silver
  • 8 GB RAM
  • 128 GB Storage
₹105,999₹112,900
Buy now
23% OFF
Infinix INBook X1 Neo XL22 Laptop Intel Celeron Quad Core 8 GB 256 GB SSD Windows 11
  • Blue
  • 4 GB RAM
  • 128 GB SSD
₹22,990₹29,990
Buy now
36% OFF
Infinix INBook X1 Pro Laptop
  • Black
  • 8 GB RAM
  • 256 GB SSD
₹44,990₹69,999
Buy now
29% OFF
Asus VivoBook 15 X515JA EJ522TS Laptop
  • Grey
  • 8 GB RAM
  • 512 GB SSD
₹44,689₹62,889
Buy now
34% OFF
Asus ROG Strix G17 G713QM K4215TS Laptop
  • Black
  • 16 GB RAM
  • 1 TB SSD
₹180,990₹272,990
Buy now
NEXT ARTICLE BEGINS