Meet DARKBert, the dark web-trained AI tool that can combat cybersecurity threats | Tech News

Meet DARKBert, the dark web-trained AI tool that can combat cybersecurity threats

A new Large Language Model is in the works that has been trained on the dataset of dark web pages and can combat cybersecurity threats. Here’s what you need to know.

| Updated on: Jun 20 2023, 12:56 IST
DarKBERT was created on the RoBERTa architecture. (Unsplash)
DarKBERT was created on the RoBERTa architecture. (Unsplash)

Large Language Models (LLMs) have gained massive popularity over the past few months, especially since the emergence of AI chatbots like ChatGPT. These AI-powered models can generate new content, such as text, images, audio, and more by studying an existing database and learning patterns to generate new and unique content. While these tools have been used to generate content using generative AI, researchers have now developed the first-of-its-kind LLM to assess and combat cybersecurity threats. Interestingly, this model has only been trained on the information present on the dark web.

What is DarKBERT?

DarkBERT is an encoder model that adopts the RoBERTa architecture, relying on transformers. Instead of being trained on the web, researchers trained this LLM on a vast dataset of dark web pages, assimilating information from places such as hacker forums, scamming websites, and other criminal internet sources. In a paper called ‘DarkBERT: A language model for the dark side of the Internet' published on that is yet to be peer-reviewed , its creators say that DarKBERT can revolutionize the fight against cybercrime by finding and analyzing the elusive domains of the Internet, which remain hidden from search engines.

While the dark web is usually concealed and inaccessible to the general public, researchers used the Tor network to access and collect data from its pages. The data then underwent several processes such as deduplication, category balancing, and pre-processing to create a refined database of the dark web, which was then finally fed to RoBERTa, which led to the creation of DarKBERT over a period of 15 days.

Cybersecurity applications

Since it is trained on the dataset of dark web pages, DarKBERT has the potential for a wide range of cybersecurity applications. It can help monitor illicit activities and bolster cybersecurity measures. It can also “combat the extreme lexical and structural diversity of the Dark Web that may be detrimental to building a proper representation of the domain,” according to the research paper.

It can automate the process of monitoring dark web forums where unlawful information is usually shared. DarKBERT can detect websites that are involved in leaking sensitive or confidential data and selling ransomware.

Lastly, it uses the BERT-family language model's fill-mask function to detect and filter out phrases linked with criminal activities which can help identify and tackle new cyber threats.

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 20 Jun, 12:55 IST