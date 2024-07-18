Apple, Nvidia and other big tech players are currently under scrutiny after a report revealed shocking details about their AI models. A report by a Wired revealed that big tech companies used the subtitles files from over 1,70,000 videos of popular YouTubes such as Marques Brownlee (MKBHD), PewDiePie, MrBeast and others. Among the tech players, Apple took a big hit as it recently rolled out iOS 18 public beta for iPhone users and its AI-based features are currently the talk of the town. Although Apple can't be directly blamed for using stolen YouTube content as the company sourced it from a third party non-profit firm, the iPhone maker has issued clarification to safeguard Apple Intelligence from the situation.

How Apple ended up in this situation

Apple's high-profile AI model called OpenELM was trained using datasets called Pile. Released by the company called EleutherAI, Pile contains unethically obtained scripts of thousands of videos on YouTube videos. While the company claims to help small developers and academics to train AI models, its datasets are open and accessible to anyone with enough computing power and space to access them. Apple is one the companies that reportedly used the data to train its AI model.

Why Apple Intelligence is different

While Apple has not denied using Pile dataset to train OpenELM, it has confirmed to 9to5Mac that OpenELM doesn't back any of its machine learning or AI, including Apple Intelligence. Apple says that it introduced the OpenELM model to back the research and foster open-source development in large language models. Referred to as a cutting-edge open language model by Apple researchers, OpenELM was specifically crafted for research purposes and not integrated into Apple Intelligence services.

This means that datasets like Pile that include transcripts of YouTube videos are not involved in the functionalities of Apple Intelligence, the next big thing for iPhones. Instead, Apple Intelligence is claimed to rely on licensed data, that includes curated and publicly accessible data gathered through their web-crawling technology. OpenELM is openly accessible through Apple's Machine Learning Research website, underscoring Apple's commitment to advancing the broader scientific community's understanding and capabilities in language modeling.



