NEW: Unlock the Future of Finance with CRYPTO ENDEVR - Explore, Invest, and Prosper in Crypto!
Crypto Endevr
  • Top Stories
    • Latest News
    • Trending
    • Editor’s Picks
  • Media
    • YouTube Videos
      • Interviews
      • Tutorials
      • Market Analysis
    • Podcasts
      • Latest Episodes
      • Featured Podcasts
      • Guest Speakers
  • Insights
    • Tokens Talk
      • Community Discussions
      • Guest Posts
      • Opinion Pieces
    • Artificial Intelligence
      • AI in Blockchain
      • AI Security
      • AI Trading Bots
  • Learn
    • Projects
      • Ethereum
      • Solana
      • SUI
      • Memecoins
    • Educational
      • Beginner Guides
      • Advanced Strategies
      • Glossary Terms
No Result
View All Result
Crypto Endevr
  • Top Stories
    • Latest News
    • Trending
    • Editor’s Picks
  • Media
    • YouTube Videos
      • Interviews
      • Tutorials
      • Market Analysis
    • Podcasts
      • Latest Episodes
      • Featured Podcasts
      • Guest Speakers
  • Insights
    • Tokens Talk
      • Community Discussions
      • Guest Posts
      • Opinion Pieces
    • Artificial Intelligence
      • AI in Blockchain
      • AI Security
      • AI Trading Bots
  • Learn
    • Projects
      • Ethereum
      • Solana
      • SUI
      • Memecoins
    • Educational
      • Beginner Guides
      • Advanced Strategies
      • Glossary Terms
No Result
View All Result
Crypto Endevr
No Result
View All Result

AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms

AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms
Share on FacebookShare on Twitter

AI Research Group’s Dataset Controversy: A Closer Look

Background

A non-profit AI research group, EleutherAI, has been at the center of a controversy surrounding the creation of a dataset called the Pile. According to ProofNews, the group scraped YouTube subtitles to create the dataset, which is in violation of YouTube’s terms of service.

The Pile Dataset

The Pile dataset allegedly includes subtitles of 173,536 YouTube videos from over 48,000 channels. Furthermore, about 12,000 deleted videos are part of the dataset.

Companies Involved

Several top tech and AI firms, including Anthropic, have used the Pile for training. Anthropic spokesperson Jennifer Martinez stated that the dataset includes “a very small subset of YouTube subtitles” but declined to comment on possible violations of YouTube’s terms of service.

Business software firm Salesforce also used the dataset. Salesforce VP of AI research Caiming Xiong said the dataset was “publicly available” and that Salesforce used it for academic and research purposes. ProofNews reported that Salesforce eventually released the same dataset publicly.

Apple used the Pile to train OpenELM, an efficient language model for on-device AI. Nvidia, Bloomberg, and Databricks also used the Pile for AI training.

ProofNews noted that its list of companies that used the dataset is not comprehensive, as companies do not always disclose which datasets they use in AI training.

Content of the Dataset

ProofNews’ search tool indicates that the Pile includes videos from crypto channels and creators, including Coinbase, Cointelegraph, Bitcoin Magazine, BitBoy Crypto, 99Bitcoins, Ivan On Tech, and Andreas Antonopolous.

The dataset also includes transcripts from major news channels, education channels, late-night shows, popular YouTube hosts, and other categories. The Pile dataset extends beyond YouTube to other websites and online content.

Previous Reports and Lawsuits

ProofNews highlighted an earlier report from the New York Times, which said OpenAI and Google had previously harvested YouTube text. Google, which owns YouTube, said the action was permissible due to its agreement with users. OpenAI did not confirm or deny the report.

AI copyright disputes are far-reaching. Law firm Baker Hoestler lists at least fifteen lawsuits involving tech firms such as Anthropic, Meta, GitHub, Stability AI, Nvidia, and Google. OpenAI faces high-profile lawsuits from Mother Jones’ parent company and The New York Times.

Conclusion

The controversy surrounding the Pile dataset highlights the importance of ensuring that AI research is conducted in a responsible and ethical manner. The use of scraped data without permission can have serious consequences and undermine trust in the AI industry.

FAQs

  • What is the Pile dataset? The Pile dataset is a collection of YouTube subtitles scraped by EleutherAI, a non-profit AI research group.
  • What companies used the Pile dataset? Several top tech and AI firms, including Anthropic, Salesforce, Apple, Nvidia, Bloomberg, and Databricks, used the Pile dataset for AI training.
  • Is the use of the Pile dataset illegal? The use of the Pile dataset may be in violation of YouTube’s terms of service, as it was scraped without permission.
  • What are the implications of AI copyright disputes? AI copyright disputes can have serious consequences, including lawsuits and damage to the reputation of companies involved.
  • What is the current state of AI research? The controversy surrounding the Pile dataset highlights the need for responsible and ethical AI research practices to ensure trust in the industry.
cryptoendevr

cryptoendevr

Related Stories

First US spot Chainlink ETF could turn LINK into the next institutional obsession

First US spot Chainlink ETF could turn LINK into the next institutional obsession

August 26, 2025
0

rewrite this content Asset manager Bitwise has applied to launch a spot Chainlink exchange-traded fund (ETF) in the United States,...

Aave reaches .1 billion TVL record, equivalent to being the 54th largest US bank

Aave reaches $41.1 billion TVL record, equivalent to being the 54th largest US bank

August 26, 2025
0

rewrite this content Aave reached an all-time high total value locked (TVL) of $41.1 billion on Aug. 24, positioning the...

Bitcoin consolidates as liquidity flows shift to Ethereum and broader altcoin markets

Bitcoin consolidates as liquidity flows shift to Ethereum and broader altcoin markets

August 25, 2025
0

rewrite this content Bitcoin (BTC) consolidates near current levels as capital inflows extend along the risk curve toward Ethereum and...

Hyperliquid surpasses Robinhood in monthly trading volume for the third consecutive month

Hyperliquid surpasses Robinhood in monthly trading volume for the third consecutive month

August 25, 2025
0

rewrite this content Hyperliquid registered more trading volume than Robinhood for the third consecutive month, with July marking the largest...

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Wealth Does Not Work Like THIS! | Mark Yusko

Wealth Does Not Work Like THIS! | Mark Yusko

August 22, 2025
OKB Defies Altcoin Crash, Bitcoin Slips to 2K as Markets Brace for Powell Speech: Your Weekly Recap

OKB Defies Altcoin Crash, Bitcoin Slips to $112K as Markets Brace for Powell Speech: Your Weekly Recap

August 22, 2025
Powell’s ‘dovish’ comments trigger crypto market cap climb to above  trillion

Powell’s ‘dovish’ comments trigger crypto market cap climb to above $4 trillion

August 22, 2025
Chipotle Launches ‘Zipotle’ Drone Deliveries in Texas

Chipotle Launches ‘Zipotle’ Drone Deliveries in Texas

August 22, 2025
Bitcoin’s .8B options expiry puts bulls on edge ahead of key test

Bitcoin’s $13.8B options expiry puts bulls on edge ahead of key test

August 22, 2025

Our Newsletter

Join TOKENS for a quick weekly digest of the best in crypto news, projects, posts, and videos for crypto knowledge and wisdom.

CRYPTO ENDEVR

About Us

Crypto Endevr aims to simplify the vast world of cryptocurrencies and blockchain technology for our readers by curating the most relevant and insightful articles from around the web. Whether you’re a seasoned investor or new to the crypto scene, our mission is to deliver a streamlined feed of news and analysis that keeps you informed and ahead of the curve.

Links

Home
Privacy Policy
Terms and Services

Resources

Glossary

Other

About Us
Contact Us

Our Newsletter

Join TOKENS for a quick weekly digest of the best in crypto news, projects, posts, and videos for crypto knowledge and wisdom.

© Copyright 2024. All Right Reserved By Crypto Endevr.

No Result
View All Result
  • Top Stories
    • Latest News
    • Trending
    • Editor’s Picks
  • Media
    • YouTube Videos
      • Interviews
      • Tutorials
      • Market Analysis
    • Podcasts
      • Latest Episodes
      • Featured Podcasts
      • Guest Speakers
  • Insights
    • Tokens Talk
      • Community Discussions
      • Guest Posts
      • Opinion Pieces
    • Artificial Intelligence
      • AI in Blockchain
      • AI Security
      • AI Trading Bots
  • Learn
    • Projects
      • Ethereum
      • Solana
      • SUI
      • Memecoins
    • Educational
      • Beginner Guides
      • Advanced Strategies
      • Glossary Terms

Copyright © 2024. All Right Reserved By Crypto Endevr