More data to train

Interesting fact: OpenAI’s Whisper was developed only because the company had already copied, analyzed, and used every usable text on the internet to train its LLM. With Whisper they were able to transcribe the audio tracks of YouTube videos and use those for training as well. indiatimes.com writes in “How tech giants cut corners to harvest data for AI” The artificial intelligence lab had exhausted every reservoir of reputable English-language text on the internet as it developed its latest AI system. It needed more data to train the next version of its technology – lots more. ...

May 12, 2024 · 1 min · 134 words

Who would pay

Dylan Patel writes for Google “We Have No Moat, And Neither Does OpenAI” This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high-quality alternative without them? And we should not expect to be able to catch up. The modern internet runs on open source for a reason. Open source has some significant advantages that we cannot replicate. ...

May 8, 2023 · 1 min · 91 words