Revolutionizing Tech at Cyber Tech Hub — Revolutionize Your Business with Cutting-Edge Tech

Unabashedly publishing 2 billion Discord messages obtained through illicit scraping, carried out by ambitious researchers.

In a public Discord setting, your communications could potentially be utilized for scientific study.

, and Administrator

2025 July 19 . 8:05 AM

3 min read

Unchecked Release of 2 Billion Discord Conversations by Researchers Online

Unabashedly publishing 2 billion Discord messages obtained through illicit scraping, carried out by ambitious researchers.

A recent revelation has sparked concerns over privacy and data compliance on the popular communication platform, Discord. According to reports, a dataset of over 2 billion messages was scraped by 404 Media and the Federal University of Minas Gerais in Brazil, using Discord's public API [1][2][3][4][5].

However, the question of compliance with Discord's Terms of Service remains unclear. Generally, Discord's policies prohibit unauthorized mass data collection and require that data use respect user privacy and platform rules. While scraping public messages via the official API might be technically allowed under certain conditions, the massive scale and publication of such a database could potentially breach Discord's guidelines on user data privacy and acceptable use.

Discord's Terms of Service explicitly state that scraping data is not allowed, a rule that has been in place since at least 2020 [6]. The spokesperson for Discord confirmed that scraping services without consent is a violation of their Terms of Service and Community Guidelines [7].

The researchers published the dataset with the aim of providing a sizable sample of human activity for other research purposes [8]. They anonymized the data by replacing usernames with pseudonyms, hashing and truncating identifiers, and removing potentially identifying features [9]. However, there is a possibility that details in the conversations could potentially identify users, especially when conversations are pieced together [10].

The initial investigation by Discord determined that the user accounts accessed Discord servers that were discoverable and widely accessible, and scraped data without permission [11]. The dataset was collected from Discord servers between 2015 and 2024, accounting for about 10% of the platform's open servers [12].

The researchers hope that the data will help explore the impact of digital platforms on political discourse, the propagation of misinformation, and the development of effective moderation and regulation strategies [13]. Potential applications of the data include discourse analysis, studying the relationship between social media and mental health, and training AI chatbots [14].

Discord is currently investigating the matter and will take appropriate enforcement actions [15]. The researchers' project may be in violation of Discord's rules as they scraped data without written consent [16]. This incident serves as a reminder to be cautious about what is shared on digital platforms, as it may be read or used in the future.

[1] [URL for the dataset publication] [2] [URL for the research paper] [3] [URL for the Discord API documentation] [4] [URL for Discord's Terms of Service] [5] [URL for Discord's Community Guidelines] [6] [URL for Discord's Terms of Service, section outlining the prohibition on scraping] [7] [URL for Discord's official statement on the matter] [8] [URL for the researchers' statement on the purpose of the dataset] [9] [URL for the researchers' description of the data anonymization process] [10] [URL for the researchers' acknowledgement of potential user identification] [11] [URL for Discord's initial investigation findings] [12] [URL for information on the timeframe of the data collection] [13] [URL for the researchers' goals for the data] [14] [URL for potential applications of the data] [15] [URL for Discord's statement on the ongoing investigation] [16] [URL for Discord's statement on the potential violation of their rules]

The enormous scraped dataset of Discord messages raises questions about data compliance and adherence to tech company's terms of service, as the use of technology for mass data collection can potentially breach privacy rules. Discord's Terms of Service, established as early as 2020, clearly forbid scraping data without consent, making the project by 404 Media and the Federal University of Minas Gerais a possible violation. The collections from Gizmodo and other tech publications only highlight the importance of understanding and respecting future technology regulations when dealing with cloud computing and data-and-cloud-computing-related activities.

Latest

In this image, we can see an advertisement contains robots and some text.

Protect Your Digital World

Killnet Launches Major Cyberattack on Japan, Targeting Government and Commercial Websites

Killnet strikes again, this time targeting Japan. The pro-Russian group's cyberwarfare is escalating, with a 42% global increase in attacks since the start of the Russia-Ukraine war.

, and Administrator

2025 October 9

Science

CSIRO Launches Innovate to Grow: Cyber Security Program for Australian SMEs

Get free R&D support for your cyber security products. Boost your business with CSIRO's expertise and funding.

, and Administrator

2025 October 9

In this image there is a bus on the road. Beside the bus there are two persons walking on the road....

Finance

India's Infrastructure Boom: PPPs Drive Highway and Railway Modernization

PPPs are revolutionizing India's highways and railways. Major expressways and station redevelopments are boosting connectivity and stimulating economic growth.

, and Administrator

2025 October 9

In this image we can see three persons wearing id cards standing on the ground. In the background...

Finance

Thredd & Featurespace Launch One View: Pioneering Fraud Detection Solution

One View offers a holistic view of customer payment activities. Self-resolving alerts empower customers to fight fraud, reducing false positives and enhancing user experience.

, and Administrator

2025 October 9

Unabashedly publishing 2 billion Discord messages obtained through illicit scraping, carried out by ambitious researchers.

Unabashedly publishing 2 billion Discord messages obtained through illicit scraping, carried out by ambitious researchers.

Read also:

Related

Latest