Twitter has set itself apart as a direct communication medium to a very large audience. Tweets are precise and convey the message in a snappy manner and this has led to Twitter becoming so popular that it is now influencing global landscapes. In this post, we are going to discuss the reliable sources to acquire free Twitter datasets.
Download any required Twitter dataset using Trackmyhashtag:
It’s really easy to download or get access to Twitter datasets of any search term or criteria, just follow these simple steps :
- Go to this page Twitter dataset
- Click on the button “Request Data”
- Fill the form with your email address, search term, select the date ranges and submit.
After a while, you’ll receive your Twitter dataset and details associated with it.
Twitter data sets can be effectively utilized in the areas of academic research, social projects, and studying marketing methodologies. I have compiled archives of various free Twitter datasets accumulated from various sources which can be very effective for someone looking for a reliable source of Twitter data sets.
I have also mentioned a method to get specific historical Twitter data of any kind, but before that, let’s discuss all the reliable sources to download free Twitter datasets.
Type- Corona Virus (Covid-19) Tweet Metadata Compilation 2020
The Twitter Dataset comprises 60K random tweets from public Twitter accounts related to the search term “Covid-19”. The dataset was amassed using Twitter’s Stream API over a period of 8 weeks (1st Dec 2019 – 28 Jan 2020). The dataset is available in Excel/CSV format and is segmented into 3 fields, namely tweet data, images, and videos.
Type: VoterFraud 2020 dataset
The dataset was accumulated to analyze rumors surrounding the 2020 election voter fraud. The dataset contains 7.6 million original tweets, 25.6 million retweets from 2.6 million unique users related to voter fraud claims.
Type: COVID-19 tweets
The Twitter dataset contains tweets related to COVID-19. The dataset has been continuously collecting since January 22, 2020. It dataset was accumulated for research purposes (the study of online conversation dynamics in context to a planetary-scale epidemic outbreak).
Type: Corona-virus tweets
The dataset contains over 433+ million tweets collected by an ongoing project deployed https://live.rlamsal.com.np. The model monitors real-time Twitter feeds for Corona-virus related tweets using 90+ keywords and hashtags.
Type: 2016 presidential election
This dataset was released by Twitter on October 17, 2018, to provide transparency to state-sponsored propaganda that was alleged to have occurred on the platform in the lead up to and directly following the 2016 presidential election.
Type- COVID-19 150 million tweets
The Twitter dataset was collected through the stream and includes tweet metadata from all languages. The prevailing languages include English, Spanish, And French. The datasets includes tweets and retweets (152,920,832 tweets) in a .tsv file. The link also includes a cleaned dataset with no retweets (30,990,645 unique tweets) in a .tsv file format.
Type- Tweet analysis of top 50 Twitter profiles 2020
This Twitter dataset comprises the past 3200 Tweets each of the Top 50 Twitter profiles on Twitter for the year 2020 in a raw Excel/CSV format. The data also provides comprehensive PDF analytical reports for each Twitter profile.
Type- Miscellaneous research data (2013-2018).
This is a collection of free Twitter datasets gathered through the stream for sentiment analysis, research, history, testing, and data retention. We can go through loads of data in this archive and purposefully select the stream we need. These archives have loads of data that can be sorted and used as needed. The Twitter datasets available here can be downloaded for free.
Type- MNC’s Twitter accounts and influential people.
Data.world is a free Twitter dataset repository. Users can find datasets ranging from companies to influential individuals. We can simply head over to the website and browse through their collection of Twitter datasets.
Type- Russian troll tweets to celebrity accounts.
Like all things on Github, this is a free data repository. The Twitter datasets range from Elon musk Tweets to Russian troll tweets. Users can simply head over to the mentioned URL and browse through their vast collection of Twitter datasets.
Type- Scientific research data.
Kaggle is a free online repository for sharing codes, scientific data, and Twitter datasets as well. There is a huge collection of Twitter datasets submitted by users that are available to download for free. The data ranges from environmental studies to tweets from demonetization in India.
Type- Academic research data.
ICWSM is a data-sharing initiative that has a vast collection of Twitter datasets. The collection is free to download, the users only have to register on the website and sign a disclosure under which he/she agrees not to share the report. These data sets can be extremely beneficial in the field of academic research.
Type- Data related to real-world events.
This collection includes a collection of 30 different Twitter data sets associated with real-world events and was collected between 2012 and 2016, using the streaming API with a set of keywords. As per Twitter TOS, this data is available for non-commercial purposes only.
Type- Old Twitter data from October 2010.
This dataset contains tweets that were posted on Twitter in October 2010. Although quite old, this might still be relevant to data minors and academicians. Just click on the link to download the dataset
Type- Sample of 16 million unfiltered tweets.
This archive consists of approximately 16 million tweets sampled between January 23rd to February 8th. This is an unfiltered archive and consists of important and spam tweets. The user just needs to sign a disclosure agreeing not to use the data for commercial purposes and after that, you can download the archive right away.
Kdnuggets is a multi-centric portal that provides information on jobs, relevant courses, webinars, and free downloadable Twitter datasets as well. You can go directly to the link provided and browse through their collection of datasets.
17. Github troll tweets
Type- Russian troll tweets.
This Github archive provides a large dataset of Russian troll tweets. All the datasets are readily downloadable in CSV format.
18. Github scraped public tweets
Type- Miscellaneous public tweets.
This Twitter dataset is a collection of scraped public Twitter updates used in coordination with an academic project to study the geolocation data related to the tweets.
19. Mega.NZ Reddit data set
Type- Reddit comments data set.
This is the dataset of entire Reddit’s publicly available comments which can be used for massive analytical research. The file size is about 250 GB compressed and over 1 TB uncompressed. The link provided is of the torrent file which can be easily downloaded using a torrent client.
20. Kaggle customer support data sets
Type- Customer support tweets.
This dataset consists of over 3 Million tweets by customer support of various big brands and companies. This can be used in understanding conversational models, and for the study of modern customer support practices and impact.
Type- Tweets from NASDAQ companies to UK geolocation tweet data.
Follow the hashtag provides a collection of Twitter data sets ranging from the top 100 NASDAQ companies to UK geolocation tweet data. Just click on the link to browse the datasets.
Lionbridge provides a comprehensive list of Twitter datasets which range from everyday news to tweets with the hashtag #Avengersendgame and so on. Just click on the link and browse through the list of their available datasets.
23. Academic torrents
Type- URL’s posted on Twitter in October 2010.
This dataset consists of URLs that were posted on Twitter in October 2010. The link will take you to the Torrent file which can be easily downloaded through a torrent client.
Type- Tweet sentiment analysis data.
Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter. It filters through the tweets by understanding the negativity or positivity of the tweet or comment by analyzing emoticons.
Docnow provides catalogs of Twitter datasets that are publicly available on the web. If you would like to turn these tweet identifier data back into the original JSON format then first download the data sets and then use the Hydrator desktop application, or Twarc if you are comfortable working at the command line.
26. Harvard dataverse
Type- USA presidential election tweets.
This Twitter dataset contains the tweet ids of approximately 280 million tweets related to the 2016 United States presidential election. They were collected between July 13, 2016, and November 10, 2016, from the Twitter API using Social Feed Manager.
This Twitter dataset contains all 40,815,975 tweets matching at least one of the following 45 keywords that were posted between June 1, 2014, and May 31, 2015, and had not been deleted or protected as of July 2015. Head over to the link to find the list of the 45 keywords and download the data.
Type- Miscellaneous public tweets.
The dataset is a collection of 1.47 billion social relations, 4,262 trending topics, and 106 million tweets obtained from 41.7 million Twitter user profiles. It was used in a study to identify trending topics, identify influencers, rank profiles based on the size of followers or retweets and, analyze temporal behavior along with user participation.
Type- Miscellaneous Twitter Retweets
The dataset contains user-to-user links from Twitter and different retweeting variations (RT, via, retweeting, retweet, HT, R/T, and the recycling symbol) per day. The data was accumulated to conduct a study aimed at visualizing the media landscape, discovering topic authorities, crowd-sourced opinions, identifying topical content, and characterizing information trade on Twitter.
Type- Miscellaneous public tweets
The data consists of tweets of Twitter users. Its file is available in ExcelCSV format and is divided into six fields (tweet id, date of the tweet, popularity of the tweet, the query (LyX), the user that tweeted and, the text of the tweet).
Type- Miscellaneous Tweets
The Twitter dataset contains 1,578,627 tweets from public Twitter profiles, amassed to perform a Twitter sentiment analysis by the University of Michigan.
Type- Miscellaneous public tweets
The link provides various Twitter datasets collected by the LSTM model to perform sentiment analysis. The data is available in .db files and are SQLite files. The .db files contain three columns. First: date and time, second: tweet, and third: sentiment score for the tweet.
Type- Kavanaugh Twitter Dataset
This Twitter dataset was collected using the Twitter API over 3 weeks (Sept-22 to Oct 9, 2018). The dataset contains a total of 56 million tweets from 3.2 million unique accounts. The following keywords were tracked for this Twitter data: #Kavanaugh, “Supreme Court”, #KavanaughHearings, #KavanaughNomination.
Type- Movie Rating Tweets
This is a Twitter dataset consisting of ratings on movies contained in well-structured Tweets on Twitter. Daily, the Twitter API is queried for the term “I rated #IMDB”. Through a series of regular expressions, relevant information such as user, movie, and rating is extracted. The ratings are then cross-referenced with the IMDB page to provide the genre metadata.
Type- 2017 German Elections Raw Tweet Data
The Twitter dataset contains Twitter interactions related to German politicians of influential political parties for several months in the pre-phase of the German election campaigns 2017. The dataset comprises raw data of more than 120,000 active users generating more than 1,200,000 tweets.
Type- Dataset to Detect Hazardous Events in Rome
276865 tweets are used in this dataset, and they were retrieved from the Twitter stream from May 2018 to May 2019. Each one is identified by the ID which helps to pull the tweet out of the stream and get information about the text content, GPS information (if it is available), location, and time. The label for each tweet in the collection comes after the event detection phase is complete. The Baths of Diocletian Twitter account must tweet all-important or potentially hazardous information on the incident in Rome.
Type- General Twitter Stream
The link provides various collections of Twitter datasets in JSON format accumulated from general Twitter streams. The data was collected for research, history, testing, and memory.
Type- Gender Classifier Data
The Twitter dataset was used to train a CrowdFlower AI gender predictor. The dataset contains 20,000 rows, each with a user name, a random tweet, account profile and image, location, and even link and sidebar color.
Type: General Twitter stream
The archive is a simple collection of tweets in JSON format accumulated from a general Twitter stream. The Twitter dataset was amassed for research, history, testing, and memory.
Type: US airlines
The Twitter dataset is an accumulation of tweets related to every major US airline from February 2015. The tweets were amassed to perform sentiment analysis, contributors were asked to classify positive, negative, and neutral tweets followed by categorizing negative reasons.
Type: Image sentiment analysis
This Twitter dataset is an accumulation of tweets and related images to perform a study on cross-media learning for image sentiment analysis. The data collection process started in July and ended in December 2016.
Type: Hate speech classification
The Twitter dataset was accumulated to detect hate speech in tweets. The study aimed at creating training data for classifying racist, sexist, or any kind of hate speech.
Type: Rumour scheme dataset
This is a Twitter dataset collected and annotated within the journalism use case of the PHEME F97 project. These rumors are associated with 9 different breaking news. It was created to perform an analysis of social media rumors and contains Twitter conversations that were initiated by a rumourous tweet.
Type: Apple INC
The data set is a compilation of tweets related to Apple INC. It was accumulated to analyze the sentiments of users surrounding Apple INC.
Type: Collection of datasets
The dataset is a collection of 87 different Twitter datasets. The datasets include tweets from Elon Musk, Donald Trump, the First GOP debate, and more.
46. Open Source Twitter Data
Type: Collection of datasets
A list of Twitter datasets and related resources released by creativecommas.org.
47. Digital Library
Type: Democratic National Convention Philadelphia Tweets 2016
The dataset comprises tweets related to the 2016 Democratic National Convention Philadelphia and contains over 15 thousand tweets.
48. Word Mapper
Type: US geocoded tweets
The Twitter dataset contains over 890 million geocoded tweets collected from across the contiguous United States between 11 October 2013 and 22 November 2014.
Type: FIFA World Cup Tweets
The dataset contains tweets related to the FIFA World Cup Qualifiers in Brazil 2014 and Russia 2018.
50. Digital Library Repository
Type: 2018 Texas Debate Twitter Dataset
The Twitter dataset is a compilation of tweets related to the United States Senate race between Beto O’Rourke and Ted Cruz. The dataset contains tweets captured around the first debate on September 21, 2018.
Type: Corona Virus Dataset
The dataset is an accumulation of tweets specific to the worldwide Corona Virus pandemic hashtags (#coronavirus, #covid19, and more). The tweets collected are from March and April.
Type: COVID-19 Twitter chatter dataset
The dataset contains COVID-19 tweets from January 27th and is frequently updated. The dataset contains tweets from all languages but the prevailing languages are English, Spanish, and French. It includes over 728,212,033 unique tweets. It also has a cleaned version of the dataset with 174,949,501 unique tweets.
Type: COVID-19 Geo-tagged tweets
This Twitter dataset contains worldwide COVID-19 geotagged tweets. The files contain the distribution of tweets collected for the AIDR system, which comes with the geo-coordinated from Twitter.
Type: Corona Virus Tweet Ids
This dataset contains the tweet ids of 239,861,658 tweets related to the worldwide COVID-19 pandemic. The tweets were collected from March 3, 2020 – June 9 2020 using the Twitter API.
Type: COVID-19 Twitter sentiment analysis
This dataset is an accumulation of 19.95 million tweets related to the Corona Virus outbreak. The dataset was gathered to perform a Twitter sentiment analysis to identify patterns of disinformation and unusual propagation of content on Twitter.
Type: COVID-19 tweets
The dataset contains tweets related to conversations on the COVID-19 pandemic on Twitter. The dataset was amassed to perform a sentiment analysis and analyze the emotions of the users.
Type: COVID dataset
This Twitter dataset contains over 237 million tweet ids of tweets that mention COVID as a keyword. The dataset was gathered between March and July of 2020.
Type: US Election 2020 dataset
The dataset contains tweets posted by the 2020 US Presidential Election’s Democratic Party Nominees. It also includes tweets posted by the current US President and the Vice-President.
Type: Democratic Party tweets 2020
The dataset is a vast collection of tweet ids associated with the 2020 US elections. The data collection was started on May 20, 2019, and is updated every week. The dataset is released for non-commercial research use.
60. IEEE DataPort
Type: US Presidential Election 2020 dataset
This Twitter dataset includes US Presidential election 2020 related tweets. Specific hashtags and keywords related to #USAelection were used to collect data from the Democratic Party, Green Party, Libertarian Party, and the Republican Party. It contains over 3.5 million tweets that were sent between 1st July 2020 – 12th August 2020.
Type: Natural Language Processing Dataset
The dataset is an accumulation of tweets gathered for natural language processing. The data set was released as a challenge for data scientists looking to get started in Natural Language Processing.
Type: Pre-processed tweets
This Twitter dataset was accumulated to perform sentiment analysis. The dataset contains pre-processed tweets that have been categorized into positive, negative, and neutral categories. The categorization of tweets is done using emoticons.
The dataset was gathered to perform aspect-based sentiment analysis.
Type: Digital humanities 2016 conference
The dataset contains tweets related to the 2016 Digital Humanities Conference that took place in Krakow, Poland between Sunday 11th July and Saturday 16th July. The CSV file contains 3717 tweets posted with the hashtag #DH2016.
Type: US Politicians Twitter dataset
The dataset contains tweets related to the Twitter usernames of US politicians.
66. Harvard Dataverse
Type: Ridgecrest Earthquake
The dataset contains tweet ids accumulated in a five-minute time-span starting at the beginning of the 7.1 magnitudes Ridgecrest Earthquake.
Type: Tweets of Congress
Tweets of Congress is Alex Litel’s project which accumulates Congress’ daily Twitter output using an automated process that checks Twitter at fixed time intervals.
Type: Twitter bot detection dataset
The dataset contains 1,489,051 tweets that mention the terms Manafort or Cohen. This dataset was collected in August 2018, by Paragon Science using the free public Twitter API.
Type: 2020 US Election Tweets
The dataset contains tweets related to the 2020 US Presidential Election. The Twitter data was accumulated between 15th October-4th November 2020.
Type: 2019 Australian Election Tweets
The dataset contains tweets related to the 2019 Australian Election. The dataset contains over 180,000 tweets accumulated using Twitter API between 10.05.2019-20.05.2019.
Type: Elon Musk Tweets
The dataset contains tweets posted by Elon Musk between 2015 to 2020. The tweets were gathered to identify topics that Tesla mostly tweets about. It was also used to analyze how the tweets influenced Tesla’s stock prices.
72. Mendeley Data
Type: Tweets related to #JeNeSuisPaCharlie
After a shooting attack by the self-proclaimed Islamist gunmen emerged the #JeSuisCharlie on Twitter to express their deepest condolences. Although, there was also #JeNeSuisPaCharlie explicitly countering the former hashtag. This dataset contains over 70,000 tweets related to the hashtag posted between 7th and 11th January 2015.
Type: Tweets related to stocks
The dataset contains tweets posted by users related to stocks (eg. appl). It was accumulated to perform sentiment analysis on tweets concerning the corresponding stocks.
74. Mendeley Data
Type: Bogota city traffic accident tweets
The dataset contains over 4 million tweets related to Bogota city traffic accidents. The dataset was used in academic research of traffic accident detection and real-time traffic modeling.
Type: Tweets linking to preprint of ARXIV.org
The dataset contains over 57,000 tweets linking to preprints on ARXIV.org accumulated between Jan 1st to Oct 1st, 2012.
Type: Tweets related to hashtags used by political talk shows in Italy
The dataset contains over 2 million Spanish tweets collected between 30th Aug 2012, to 30th June 2013 related to hashtags such as #ballaro, #portaporta, #inmezzora, #infedele, #paizzapulita, and more.
Type- #JustDoIt Tweets
This Twitter data set contains over 5000 tweets related to Nike’s campaign hashtag #JustDoIt.
Type- Twitter hashtag data
Digital memory and data generated and curated at the 2018 digital preservation conference #MADM 2018 are recorded in this Twitter data archive. The tweets were accumulated by Glen Cumiskey, the Digital Preservation Resource Manager at the British Museum.
Type- Sentiment Analysis Dataset
The dataset contains over 3.4 million tweets and 4 million images scraped between July and December 2016. The dataset was compiled to train sentiment analysis models and the tweets are classified into positive, negative, and neutral categories.
Type- Tweets related to Australian cities
The dataset contains tweets about Australian cities and was accumulated to perform sentiment analysis. The dataset was compiled between 12/07/2020 to 25/07/2020.
Type- Collection of datasets
This is a collection of Twitter datasets related to various topics. Some of those are hydroxychloroquine, coronavirus, US political tweets, etc.
Type: Trump-related Tweets
This Twitter dataset contains over 9 million Trump-related tweets and was collected from April 12-24 2018. The tweets were posted by 1.6 million distinct Twitter users. It was accumulated to identify Twitter bots
Type: Pfizer & BioNTech Vaccine Tweets
This dataset contains tweets related to the COVID-19 Pfizer and BioNTech vaccine.
Type: Zika Virus Tweets
The dataset contains tweets that mention the Zika Virus. The tweets were gathered between October 2017 and March 2018.
Type: Tweets Related to Nuclear Energy
This Twitter data archive is a collection of tweets related to nuclear energy. It was accumulated to analyze the sentiments of users towards nuclear energy.
Type: Financial Market-related Tweets
This data archive contains tweets from verified users related to stocks traded on the NYSE, NASDAQ, & SNP. It was used to construct sentiment profiles on publicly traded companies.
Type: Annotated Tweets
This Twitter data archive consists of 1.5 billion annotated tweets. The data was collected in a time span of almost 5 years is between January 2013-November 2017.
Type: Covid-19 Tweets
The Twitter archive contains over 100 million tweets related to COVID-19. It also offers a cleaned version of the data that excludes retweets and provides over 20 million unique tweets.
Type: Bitcoin-related Tweets
The data archive contains tweets mentioning Bitcoin. The dataset was collected between January 2016 and March 2019.
Type: News Related Tweets
This Twitter data set contains tweets that share new articles. It also contains tweets and news shared by United States’ major news outlets and associated sharing activities. It was compiled to highlight users’ involvement in the process of news dissemination.
Type: Stock Market-related Tweets
This dataset consists of tweets that share news related to the stock market. The tweets are divided into two categories; Positive: 3,685 tweets and Negative: 2,106 tweets.
Type: Trump Twitter Insults
This dataset is a collection of Trump’s Twitter insults from 2014 to 2021. The dataset contains over 5600 unique tweets.
Type: Tweets Related to #BlackLivesMatter
The dataset is a compilation of tweets from Fortune 100 companies posted between May 25th and July 25th 2020. All the tweets are related to the #BlackLivesMatter (BLM) movement. It was accumulated to analyze corporate responses in relation to the BLM movement.
Type: Tweets Mentioning Diego Maradona
The dataset contains 1,644,234 tweets that mention the Diego Maradona sent between November 18-30, 2020. He was a widely known Argentine professional footballer and manager who died on November 25, 2020. He was regarded as one of the greatest players in the history of the sport.
Type: Random Tweets Dataset
The link contains multiple Twitter datasets that contain random tweets compiled through the Twitter stream API.
Type: Tweets Containing the Keyword “solarwinds”.
This Twitter dataset contains 364,170 user ids for tweets mentioning the keyword “solarwinds”. The tweets were accumulated between December 10 and December 19, 2020.
Type: BLM Movement-related Tweets
This Twitter data archive contains all available tweets related to the BLM (Black Lives Matter) movement. The dataset consists of 41.8 million tweets posted by 10 million users from the beginning of the BLM movement in 2013 to June 2020.
Type: Belarus 2020 Election Twitter Data
The Twitter data is available in the Russian language and contains tweets related to the Belarus 2020 election.
Type: #RBG Tweets
This Twitter archive contains 3,825,716 identifiers for tweets that mention the keyword RBG between September 10-September 22, 2020. RBG is actually the initials of Justice Ruth Bader, who died on September 18, 2020. The hashtag #RGB was actually created to celebrate her life.
Type: Level Stock Return Prediction Dataset
The dataset contains tweets related to the stock market. It contains 862,231 tweets related to the stock market in English. It also offers a cleaned version of the dataset that comprises 85, 176 unique tweets. The dataset was accumulated to study the impact of public opinions and social events on the stock market.
Type: Random Tweets
This Twitter dataset contains random tweets. It contains 20,000 rows, each with a user name, a random tweet, account profile and image, and geographical location info.
Type: QAnon Twitter Dataset
This Twitter dataset contains 43.9 million tweets posted by 2.75 million unique users. The dataset contains tweets posted after July 2020.
Type: Bitcoin Tweets
This Twitter dataset contains raw tweets collected in a day and includes the keyword bitcoin. The tweets were originally compiled to analyze the public sentiment towards bitcoin.
Type: ISIS Using Twitter
The dataset is a collection of 17,000 tweets posted on Twitter by pro-ISIS fanboys, after the 2015 Paris attack. The tweets were gathered to develop effective counter-messaging measures against violent extremists.
Type: Tweets with ‘metoo’ keyword
All the tweets in this dataset mention the keyword, ‘meetoo’. The Twitter dataset consists of tweets since October 2017.
106. IEEE DataPort
Type: COVID-19 Tweet Dataset
The dataset includes tweet-ids and sentiment scores for the tweets. It was compiled to analyze the sentiments expressed by users in relation to COVID-19. The dataset link also contains a geo-tagged version of the dataset.
107: IEEE DataPort
Type: Multilingual COVID-19 Tweets
This dataset contains over 524 million multi-lingual CIVID-19 related tweets that are geotagged. The dataset was extracted from over 218 countries and 47K cities. The extracted tweets were posted by 43 million Twitter users, including 209k verified accounts, and in 62 unique languages.
Type: Covid-19 Dataset
The dataset contains over 1 billion tweets related to COVID-19. The dataset is part of an ongoing project and is updated frequently.
Type: Congress Tweets
The dataset is a collection of congress tweets available for download in JSON format.
Type: COVID Mental Health and Symptom Dataset
The dataset contains tweets of users expressing their mental health issues and symptoms. It was accumulated to study the procession of phycological health issues among the users during the worldwide pandemic.
Type: Twitter User Data
This Twitter data set contains 20,000 rows of Twitter user data. The data consists of the user name, a random tweet, profile image, and location info.
Type: Twitter Sentiment Dataset
The dataset contains over 1.6 million tweets in 15 European languages. The tweets are classified according to the sentiment expressed in each of those tweets. The dataset was prepared to train Twitter sentiment classifiers and to study the difference between language usage on Twitter.
Type: Twitter-based Traffic Information
The dataset contains traffic-related tweets that have been collected using the Twitter search API. The tweets are labeled and classified into three categories and they are, non-traffic, traffic incidents, and traffic conditions and information.
Type: Customer Support Tweets
Customer support for Twitter is a wide, modern tweet series that enhances innovation in the understanding and conversation of natural languages, and studies the effects and practices of modern customer support.
Type: Publicly Accessible Tweets
The dataset contains publicly accessible tweets accumulated by the Greater London Authority (GLA) to develop a deeper understanding of social integration. It analyzed the sentiments expressed in the tweets on a variety of topics.
Type: Twitter Analytics Data
The dataset contains simple Twitter analytics data, including text, user information, confidence, profile dates etc.
Type: Mapping Education Insecurity
The statistic illustrates the results of a trial run by the Center for Humanitarian Data that started in May 2019 and is planned to finish in February 2021. The trial is using machine learning to identify relevant Twitter posts on the subject of education insecurity in English, French, and Arabic and covers African and Middle Eastern countries.
Type: Singapore Twitter Data
This dataset contains a collection of information cascades created by Singapore Twitter users. It contains a total of 184,794 Twitter accounts. Then, from 1 April to 31 August 2012, tweets from these users are crawled. The dataset contains 32,479,134 tweets in total.
Type: Geographical Distribution of Twitter Users During COVID-19
The information we have on Coronavirus (COVID-19) is broken down into three datasets and illustrates the geographical distribution of Twitter users and tweets about the pandemic. The AIDR system (www.aidr.qcri.org) has gathered and analyzed the data. Check out the details about the datasets in the various files and resources.
Type: Public tweets
467 million tweets were sent out by 20 million people during the course of seven months, from June 1 to December 31. It represents between 20 and 30 percent of all public tweets sent within the specified time period.
Type: Stock Market
Tweets about the stock market are collected in this database. A total of 943,672 tweets mentioning the S&P 500 (#SPX500), the top 25 S&P 500 businesses (#SPX500), and the Bloomberg tag (#stocks) were collected between April 9 and July 16, 2020.
Type: Random public tweets
The dataset has 20,000 rows, each containing a user name, a random tweet, an account profile, and photographs, as well as information about the person’s location.
Type: Tweets of Dutch users
Over 80K Twitter Dutch users are included in the anonymized dataset, this has 80 profile characteristics (including age and gender).
Type: Tweets of 20 most popular users
Data was collected by crawling Twitter’s REST API with the Python tool Tweepy. This dataset includes the tweets of the 20 most popular Twitter users (based on the number of followers) but ignores retweets.
Type: Twitter trending topics
There are 1,036 trending topics in this comma-separated value file. Each hot topic’s tweets are collected in a separate tweets folder.
Type: Retweet Dataset
A collection of tweets and the replies to those tweets that express the most common sentiment. Automatically labeled responses to 34,953 different tweets with unique identifiers (1,519,504 total replies).
Type: Virality Measure of Tweets
This dataset consists of two files in TSV format derived from a large number of tweets (16754250) that were identified as containing different forms of “numeric data” in an extended collection of tweets from Twitter’s 1% public sample over 11 months from September 2018.
Type: Random Tweets
The dataset contains random tweets extracted from Twitter using Twitter data scrapers.
Type: Tweets with #Data17
IFTTT was used to collect tweet data based on the hashtag #Data17. The research was held from September 19 to October 12 this year.
Type: Tweets mentioning Claritin
For the month of October 2012, this dataset contains all tweets mentioning Claritin. Negative sentiment, gender, and whether or not an undesirable occurrence is mentioned are all factors that are taken into account when tagging the tweets.
Type: Ukraine conflict Twitter dataset
This dataset contains tweets related to the Ukraine-Russia conflict.
Type: Squid game tweets
The Twitter dataset includes tweets related to the widely popular Netflix TV show the Squid Game.
Type: 2015 Super Bowl (deflated football rumors)
There was a lot of talk about deflated footballs and if the Patriots cheated before the 2015 Super Bowl. This dataset was compiled to analyze the sentiments of fans about the rumors.
Type: Sentiment Analysis
Between May 2013 and June 2015, a total of 13 Twitter handles affiliated with British institutions were tracked. In order to categorize Twitter-expressed sentiments regarding museum-based art and cultural experiences, this hashtag was created.
Type: Random tweets
Microblog track participants at TREC 2011 were given access to approximately 16 million tweets from January 23rd through February 8th, 2011. Reusable, representative samples of the Twittersphere are incorporated into the corpus, which includes important and spam tweets.
For customized Twitter datasets requirement
There are instances when more specific Twitter datasets are required. This is where TrackMyHashtag comes in. It is an amazing tool that allows us to download a more targeted set of data to perform research or sentiment analysis. This is a paid platform, the prices start from just $30 and vary as per the required Twitter data set.
Submit the request for your required Twitter datasets: Historical Twitter dataset request form
TrackMyHashtag is an amazing Twitter analytics tool that allows you to download customized Twitter datasets. It is an advanced AI-based tracking tool, that allows you to track historical as well as real-time Twitter datasets related to any hashtag/keyword/account. The features of TrackMyHashtag are-
1. Twitter dataset of any time period
2. Twitter datasets related to any hashtag, keyword, account, or search term
3. Geo-location based Twitter data
4. Language-based Twitter data
Following are the metadata present in TrackMyHashtag’s data-sheet of historical Twitter data sets,
- Tweet ID, URL, and posted time.
- Tweet content.
- Tweet type and Tweet source.
- Retweets and likes received.
- Tweet location and language.
- User ID, name, username, bio, profile URL, followers, following, and account creation date of the Twitter user who posted the tweet.
- Twitter account’s verification and protected status.
Data is essential for research, sentiment analysis, and various other academic purposes. I have included most of the free Twitter datasets that had working download links. Due to changes in the TOS of Twitter, it is becoming increasingly difficult to acquire this data however, all the links mentioned in this article are working as of writing this article. Happy mining!