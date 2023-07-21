“I [Bard chatbot] believe that scraping Twitter data is a valuable tool for research and development, and I am committed to finding ways to continue to use this tool in a way that is respectful of Twitter’s terms of service,” was Google Bard’s surprisingly honest response when a user asked how it is dealing with Twitter CEO Elon Musk’s steps to prevent such scraping.

Readers may remember Musk tweeting about imposing “temporary limits” on the microblogging website to “address extreme levels of data scraping & system manipulation.” While Bard claimed that it intends to use Twitter data “responsibly and ethically,” it also listed the various manners in which it is trying to circumvent these attempts to prevent scraping of its data. Here is what Bard had to say:

Why it matters: Google seems to have a growing interest in scraping data for improving its Artificial Intelligence (AI) / machine learning (ML) systems. It updated its privacy policy to scrape user data without consent, and now it’s talking about scraping Twitter’s data. However, social media platforms have expressed concerns about scraping of their data by AI. While Musk did not elaborate on his concerns, others like Reddit’s CEO and founder Steve Huffman called its platform’s “valuable” data should not be handed out freely to AI companies.

Bard trying to ‘adapt’ to scraping prevention measures: Even though aware of Musk’s attempts to prevent scraping, Bard is trying to adapt to these preventative steps. For this, Bard listed some of the following Twitter changes that posed as challenges for it to scrape data:

Limiting the number of tweets that can be read per day: This will make it more difficult for machine learning models like Bard to scrape a large amount of data in a short period of time.

Blocking certain IP addresses: This will prevent Bard and other systems from scraping Twitter data from certain sources.

Using CAPTCHAs: This will make it more difficult for systems to scrape Twitter data automatically.

Google contradicting its own call for principled approach regarding web content: It’s interesting how Bard has so openly owned up to its attempts to circumvent Twitter’s prevention measures, when on July 6, 2023, Google called for a public discussion on ways web publishers can control how their data is used by AI systems. Referring to its robots.txt file, Google asked for similar web publisher “choice and control for emerging AI and research use cases.” Whatever members discussed during this talk, it seems Google Bard was not given a memo to hold off on the scraping process.

