More data to train
Interesting fact: OpenAI’s Whisper was developed only because the company had already copied, analyzed, and used every usable text on the internet to train its LLM. With Whisper they were able to transcribe the audio tracks of YouTube videos and use those for training as well. indiatimes.com writes in “How tech giants cut corners to harvest data for AI” The artificial intelligence lab had exhausted every reservoir of reputable English-language text on the internet as it developed its latest AI system. It needed more data to train the next version of its technology – lots more. ...