NEW: Unlock the Future of Finance with CRYPTO ENDEVR - Explore, Invest, and Prosper in Crypto!
Crypto Endevr
  • Top Stories
    • Latest News
    • Trending
    • Editor’s Picks
  • Media
    • YouTube Videos
      • Interviews
      • Tutorials
      • Market Analysis
    • Podcasts
      • Latest Episodes
      • Featured Podcasts
      • Guest Speakers
  • Insights
    • Tokens Talk
      • Community Discussions
      • Guest Posts
      • Opinion Pieces
    • Artificial Intelligence
      • AI in Blockchain
      • AI Security
      • AI Trading Bots
  • Learn
    • Projects
      • Ethereum
      • Solana
      • SUI
      • Memecoins
    • Educational
      • Beginner Guides
      • Advanced Strategies
      • Glossary Terms
No Result
View All Result
Crypto Endevr
  • Top Stories
    • Latest News
    • Trending
    • Editor’s Picks
  • Media
    • YouTube Videos
      • Interviews
      • Tutorials
      • Market Analysis
    • Podcasts
      • Latest Episodes
      • Featured Podcasts
      • Guest Speakers
  • Insights
    • Tokens Talk
      • Community Discussions
      • Guest Posts
      • Opinion Pieces
    • Artificial Intelligence
      • AI in Blockchain
      • AI Security
      • AI Trading Bots
  • Learn
    • Projects
      • Ethereum
      • Solana
      • SUI
      • Memecoins
    • Educational
      • Beginner Guides
      • Advanced Strategies
      • Glossary Terms
No Result
View All Result
Crypto Endevr
No Result
View All Result

AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms

AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms
Share on FacebookShare on Twitter

AI Research Group’s Dataset Controversy: A Closer Look

Background

A non-profit AI research group, EleutherAI, has been at the center of a controversy surrounding the creation of a dataset called the Pile. According to ProofNews, the group scraped YouTube subtitles to create the dataset, which is in violation of YouTube’s terms of service.

The Pile Dataset

The Pile dataset allegedly includes subtitles of 173,536 YouTube videos from over 48,000 channels. Furthermore, about 12,000 deleted videos are part of the dataset.

Companies Involved

Several top tech and AI firms, including Anthropic, have used the Pile for training. Anthropic spokesperson Jennifer Martinez stated that the dataset includes “a very small subset of YouTube subtitles” but declined to comment on possible violations of YouTube’s terms of service.

Business software firm Salesforce also used the dataset. Salesforce VP of AI research Caiming Xiong said the dataset was “publicly available” and that Salesforce used it for academic and research purposes. ProofNews reported that Salesforce eventually released the same dataset publicly.

Apple used the Pile to train OpenELM, an efficient language model for on-device AI. Nvidia, Bloomberg, and Databricks also used the Pile for AI training.

ProofNews noted that its list of companies that used the dataset is not comprehensive, as companies do not always disclose which datasets they use in AI training.

Content of the Dataset

ProofNews’ search tool indicates that the Pile includes videos from crypto channels and creators, including Coinbase, Cointelegraph, Bitcoin Magazine, BitBoy Crypto, 99Bitcoins, Ivan On Tech, and Andreas Antonopolous.

The dataset also includes transcripts from major news channels, education channels, late-night shows, popular YouTube hosts, and other categories. The Pile dataset extends beyond YouTube to other websites and online content.

Previous Reports and Lawsuits

ProofNews highlighted an earlier report from the New York Times, which said OpenAI and Google had previously harvested YouTube text. Google, which owns YouTube, said the action was permissible due to its agreement with users. OpenAI did not confirm or deny the report.

AI copyright disputes are far-reaching. Law firm Baker Hoestler lists at least fifteen lawsuits involving tech firms such as Anthropic, Meta, GitHub, Stability AI, Nvidia, and Google. OpenAI faces high-profile lawsuits from Mother Jones’ parent company and The New York Times.

Conclusion

The controversy surrounding the Pile dataset highlights the importance of ensuring that AI research is conducted in a responsible and ethical manner. The use of scraped data without permission can have serious consequences and undermine trust in the AI industry.

FAQs

  • What is the Pile dataset? The Pile dataset is a collection of YouTube subtitles scraped by EleutherAI, a non-profit AI research group.
  • What companies used the Pile dataset? Several top tech and AI firms, including Anthropic, Salesforce, Apple, Nvidia, Bloomberg, and Databricks, used the Pile dataset for AI training.
  • Is the use of the Pile dataset illegal? The use of the Pile dataset may be in violation of YouTube’s terms of service, as it was scraped without permission.
  • What are the implications of AI copyright disputes? AI copyright disputes can have serious consequences, including lawsuits and damage to the reputation of companies involved.
  • What is the current state of AI research? The controversy surrounding the Pile dataset highlights the need for responsible and ethical AI research practices to ensure trust in the industry.
cryptoendevr

cryptoendevr

Related Stories

Beijing boosts digital yuan for global trade with new operations center

Beijing boosts digital yuan for global trade with new operations center

September 27, 2025
0

rewrite this content China has launched a new operations center in Shanghai dedicated to advancing the digital yuan.The People’s Bank...

Bitcoin’s 2025 cycle dip mirrors 2017 – could 0k be next?

Bitcoin’s 2025 cycle dip mirrors 2017 – could $200k be next?

September 26, 2025
0

rewrite this content Bitcoin’s spot price movement throughout the third quarter of 2025 and its recent dip align closely with...

XPL surges 113% to all-time high following launch day crash

XPL surges 113% to all-time high following launch day crash

September 26, 2025
0

rewrite this content Plasma’s XPL token posted a 113% recovery to $1.54 within hours of crashing from $0.93 to $0.7218...

Russian-linked crypto wallets channel B to skirt sanctions using Tether’s USDT

Russian-linked crypto wallets channel $8B to skirt sanctions using Tether’s USDT

September 26, 2025
0

rewrite this content A network of crypto wallets connected to Russian state-linked entities helped move more than $8 billion in...

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Bank of North Dakota Taps Fiserv to Launch State-Backed ‘Roughrider Coin’

Bank of North Dakota Taps Fiserv to Launch State-Backed ‘Roughrider Coin’

October 8, 2025
Urgent Shiba Inu (SHIB) Warning: Here’s the Latest Threat

Urgent Shiba Inu (SHIB) Warning: Here’s the Latest Threat

October 8, 2025
Grayscale Stakes 32,000 Ethereum Worth 0 Million – Institutional Demand Grows

Grayscale Stakes 32,000 Ethereum Worth $150 Million – Institutional Demand Grows

October 8, 2025
This NYSE-Listed Food Company Aims to Stack .2 Billion in Bitcoin

This NYSE-Listed Food Company Aims to Stack $1.2 Billion in Bitcoin

October 8, 2025
Ethereum Price Prediction: Jack Ma’s ETH Reserve Report Boosts Market Sentiment – CryptoRank

Ethereum Price Prediction: Jack Ma’s ETH Reserve Report Boosts Market Sentiment – CryptoRank

October 8, 2025

Our Newsletter

Join TOKENS for a quick weekly digest of the best in crypto news, projects, posts, and videos for crypto knowledge and wisdom.

CRYPTO ENDEVR

About Us

Crypto Endevr aims to simplify the vast world of cryptocurrencies and blockchain technology for our readers by curating the most relevant and insightful articles from around the web. Whether you’re a seasoned investor or new to the crypto scene, our mission is to deliver a streamlined feed of news and analysis that keeps you informed and ahead of the curve.

Links

Home
Privacy Policy
Terms and Services

Resources

Glossary

Other

About Us
Contact Us

Our Newsletter

Join TOKENS for a quick weekly digest of the best in crypto news, projects, posts, and videos for crypto knowledge and wisdom.

© Copyright 2024. All Right Reserved By Crypto Endevr.

No Result
View All Result
  • Top Stories
    • Latest News
    • Trending
    • Editor’s Picks
  • Media
    • YouTube Videos
      • Interviews
      • Tutorials
      • Market Analysis
    • Podcasts
      • Latest Episodes
      • Featured Podcasts
      • Guest Speakers
  • Insights
    • Tokens Talk
      • Community Discussions
      • Guest Posts
      • Opinion Pieces
    • Artificial Intelligence
      • AI in Blockchain
      • AI Security
      • AI Trading Bots
  • Learn
    • Projects
      • Ethereum
      • Solana
      • SUI
      • Memecoins
    • Educational
      • Beginner Guides
      • Advanced Strategies
      • Glossary Terms

Copyright © 2024. All Right Reserved By Crypto Endevr