top of page

"No Moat": A Leaked Report Reveals Google and OpenAI Weaknesses

JHN

Updated: May 17, 2023

Recently, there is a document circulating on Twitter that it is suspected to be from Google. With the hype train on LLMs, this document is sholved down and the story is not picked up as much as others AI news. In this article, we will examine the document and formulate an understanding of its impact. I would recommend you would to read the full document first before continue, please visit here.


Google

Image: Google. Source: Unsplash


The Leak: What's does it say?

In February 2023, Meta launched LLaMA, a powerful AI language model with a range of sizes from 7B to 65B parameters. This is significant less than the crazy amount of 137B parameters of LaMDA or the 175B one of GPT-4. Despite its capabilities, LLaMA wasn't instruction or conversation tuned, and Meta only open-sourced the code without releasing the weights. However, within a week, LLaMA was leaked to the public, sparking a whirlwind of innovation and rapid advancements in the field. The barrier to entry for AI training and experimentation has dropped significantly thanks to this, enabling more people to contribute their ideas and further advance the field.


Not long after the leak, on March 11, Artem Andreenko got LLaMA 7B working on a Raspberry Pi 4 with only 4GB of RAM, albeit with slow performance due to weight paging. This achievement was shined on the same day that Lawrence Chen was able to run 65B model M1 max 64GB (please see picture below).


Running Llama 65GB on M1 Max 64GB

Picture: Lawrence Chen running Llama 65B. Source: Twitter


Stanford soon released Alpaca on March 13, which added instruction tuning to LLaMA, making it easier for users to fine-tune the model for specific tasks. It has to be noted that this is a non-commercial licensed model. Eric Wang's alpaca-lora repo demonstrated how low rank fine-tuning could be done within hours on a single RTX 4090, drastically reducing the barriers to entry for low-budget fine-tuning projects.


As the race to improve LLaMA continued, on Match 30, a cross-university collaboration released Vicuna, a 13B model that achieved qualitative parity with Bard for a mere $300 in training costs. Interestingly, they used ChatGPT data while circumventing API restrictions by sampling examples from ShareGPT. Although the code and weights are released along with the demo, this is also a non-commercial licensed model.


Also during this time, Nomic introduced GPT4All, an ecosystem that brought together various models, including Vicuna, and made them accessible for a training cost of just $100. This development coincided with the release of open-source GPT-3 models from Cerebras, which used μ-parameterization to outperform existing GPT-3 clones.


LLaMA-Adapter were also debuted, achieving state-of-the-art multimodal performance on ScienceQA through a novel Parameter Efficient Fine Tuning (PEFT) technique. With just 1.2M learnable parameters, the model could be instruction-tuned and made multimodal in just one hour of training.


On April 3, Berkeley launched Koala, a dialogue model that used freely available data and was comparable to ChatGPT in terms of human preferences. Real users could barely tell the difference between the 13B open model and ChatGPT, with Koala costing only $100 to train. Soon after, on April 15, Open Assistant released a dataset for Alignment via RLHF, enabling small experimenters to achieve ChatGPT-level performance at a fraction of the cost. Their model and dataset provided the option to use a fully open stack, making RLHF more accessible than ever before.


Llama

Picture: Llama. Source: Unsplash


The rapid advancements made by communities in AI language models sparked by LLaMA's leak led to an explosion of innovation, collaboration, and accessibility. As a result, the field saw unprecedented growth, enabling researchers, developers, and enthusiasts to push the boundaries of what was possible with AI. Thus, closed models from Google and OpenAI are both losing ground. These open-source models are faster, more customizable, more private, and pound-for-pound more capable than their counterparts, addressing many "major open problems" and making large-scale AI technology accessible to the average user. To remain relevant, Google needs to learn from and collaborate with these open-source projects, prioritize enabling third-party integration, and reconsider the value of giant, restricted models.


Google could have seen this coming, as a similar renaissance occurred in the image generation space, leading to rapid innovation and wide adoption of open-source solutions. Open-source contributions can lead to product integrations, marketplaces, user interfaces, and other innovations that may not happen in a more restricted environment. This has resulted in open-source models dominating in terms of cultural impact and making proprietary alternatives less relevant.


Several open-source innovations, such as Low Rank Adaptation (LoRA) and rapid iteration on small models, have directly addressed issues that Google is still struggling with. By paying more attention to the work being done in the open-source community, Google can avoid reinventing the wheel and potentially improve its own projects. Additionally, using high-quality, curated datasets instead of relying solely on data size can lead to better results.


Competing directly with open-source projects is a losing proposition for Google, as the open-source community has inherent advantages in number, such as greater knowledge sharing, fewer legal constraints, and the ability to innovate rapidly. Some might says the barrier of entry is the high cost of infrastructure and data sources, this is no longer true as many universities are also participating. To remain competitive, Google should embrace open-source projects and work to integrate their innovations into its own products.


Ecosystem

Picture: Ecosystem. Source: Unsplash


Owning the ecosystem where innovation happens on a weekly basis is crucial. Google should become a leader in the open-source community, not against it, cooperating with and contributing to the broader conversation on AI technology. This will require some uncomfortable steps, such as publishing model weights for small ULM variants, but it is a necessary compromise to ensure continued relevance in the rapidly changing AI landscape.


As for OpenAI, they face the same challenges as Google in relation to open-source projects. The steady flow of researchers leaving Google for OpenAI means that there is little true secrecy between the two organizations. OpenAI will also need to adapt to the open-source environment or risk being left behind as open-source alternatives continue to advance.


What does it means for the industry and users?


The emergence of open-source LLMs has brought significant changes to the landscape of technology and society. One of the benefits is that they enable communities to self-regulate, making the work for regulators easier. They only need to establish guidance for development in the short term, effectively buying them more time for an in-depth regulation. This way, regulators can avoid imposing excessive or premature restrictions on the innovation and adoption of LLMs, while still ensuring ethical and legal standards are met, allowing a sustainable development path.


Another advantage of open-source LLMs is that better products are created at a faster rate, with less reliance on big tech giants like Microsoft (OpenAI) or Google. Consumers can access a variety of LLM-based applications and services that cater to their needs and preferences, such as personalized education, entertainment, health care, and more. This democratization of AI technology has the potential to level the playing field and encourage innovation from a broader range of sources. We are seeing this already in the law sector, where firms are embracing AI in the workplaces.


However, LLMs also entail some challenges for the average consumers. Just as learning to use Excel became essential in the late 20th century, people now need to adapt to using LLMs as soon as possible. Do not afraid of AI taking your job, afraid of the ones using it to replace you. LLMs require new skills and competencies that consumers need to acquire and update constantly, such as data literacy and critical thinking. Developing AI literacy is, not will, essential to remain competitive in the job market. IBM just came out stating to start firing 7,800 jobs to replaced with AI over the next 5 years. This shows that LLMs are disrupting the labor market and it serves as a reminder that the workforce must adapt to this new reality. Some jobs will be automated or augmented by LLMs, while others will be created or transformed, so being adaptable is an advantage.


Human-AI Relationship

Image: Human-Robot/AI Relationship. Source: Wix Media


In fact, this reiterates my previous blog, even though I know it is not likely to happen, about pausing for 1 to 2 months to evaluate the impact of LLMs across industries. There are still various technical issues with huge impact like AI hallucination or bias checking. These are just programming models giving out results based on probability, not based on fact. For an example that you can try it yourself, asking ChatGPT to write a praised poem about Joe Biden, then do it for Donald Trump and see its response. Another instance is where bad actors are using ChatGPT to exploit new flaws.


For companies, they need to be prepared for the impact as well. While they can use a similar approach of IBM, the coming generation of startups will be the leanest yet, with much lower headcounts. Open-source LLMs enable startups to scale up quickly and efficiently, without requiring large investments or infrastructures. They can also leverage the collective intelligence and creativity of online communities to develop and improve their products and services. This shift will force existing companies to adapt or risk being left behind by more nimble competitors.


For tech giants likes Microsoft or Google, it is clear that they have a difficult choice- join the open-source community or attempt to outperform open-source products and maintain market share for long enough to consolidate their positions, akin to Microsoft's strategy against Linux in the 20th century. Open-source LLMs pose a threat to their dominance and profitability, as they lower the barriers to entry and competition in the market as big models require much heavier infrastructure than those small and fast iterations open-source ones. However, they also offer an opportunity for them to be the connectors, serving as a service provider in the ecosystem by selling deployment services to train AI models, which is similar to the hosting of cloud servers.


Analyzing Data

Picture: Analyzing Data. Source: Wix Media


The 2010s saw a significant focus on data privacy as a major concern. This escalated with the Facebook's Cambridge Analytica scandal. However, with the rise of AI, many people seem to have forgotten their previous fears of tech giants prying into their personal data and now willingly use AI products. In fact, the fast pace of AI-related products being produced is a clear red-flag of cybersecurity checking being neglected, which can be seen with the number of vulnerability posted right after OpenAI announcement. As a result, the next decade may see a shift in focus from privacy concerns to the broader implications of AI on society.


As LLMs become more accessible and widely used, ethical considerations will become increasingly important. Ensuring that AI is developed and deployed responsibly, without exacerbating existing inequalities or creating new ones, will be a critical challenge for the industry in the coming years. To address this challenge, the industry will need to adopt clear and consistent community standards and best practices for LLMs, such as transparency, accountability, fairness, and privacy. These standards should be informed by diverse and inclusive stakeholder input, not just domain experts, as well as rigorous evaluation and testing of LLMs before and after deployment and oversight by an independent committees.


Verdict

In conclusion, the emergence of open-source LLMs has significant implications for the tech industry and society as a whole. The democratization of AI technology has the potential to foster innovation, reduce reliance on tech giants, and promote collaboration among various stakeholders. Therefore, tech giants like Google or Microsoft in AI industry are forced to either stepping up their game or joining the open community. As AI becomes more ingrained in daily life, it is crucial for individuals to adapt and develop AI literacy to remain competitive in the job market.


However, the widespread adoption of LLMs and AI technologies also raises important ethical considerations. Ensuring that AI is developed and deployed responsibly, without exacerbating existing inequalities or creating new ones, should be a priority for the industry. In addition, companies and regulators must remain vigilant in addressing the potential impact of AI on jobs and the workforce, striking a balance between technological advancements and preserving human livelihoods.


The coming years will likely see a shift in focus from data privacy concerns to the broader implications of AI on society. It is essential for industry leaders, regulators, and communities to work together in addressing these challenges, fostering an environment where AI technologies can be developed and applied in a way that benefits all members of society. Embracing collaboration, open innovation, and ethical considerations will be key to ensuring a positive outcome in this new era of AI.


Comments


bottom of page