Unpacking the “Open Source AI” Debate

May 31, 2024

By Walter Heck
Reading Time 4 minutes

In the evolving conversation around artificial intelligence (AI) and open source, the details matter. Recently, discussions at tech conferences like FOSDEM in Brussels and Config Management Camp in Ghent have brought to light the complex nature of what it means for an AI to be “open source.” As someone passionate about technology and a big fan of open source, these discussions hit close to home. They’ve highlighted the nuances in definitions and the importance of clarity in this space.

A key term that emerged from these talks (at least it was new to me) is “open weights,” referring to AI whose code and model are open, but the training data is not fully accessible. This concept is crucial for understanding the current landscape of open-source AI. It suggests a compromise, offering some level of transparency and accessibility without fully opening up the data behind AI’s learning processes. I personally think that we’ll see more open weights AI’s than fully open source, not in the least because training data can be large and confidential.

There’s a grey area, and for instance Meta’s LLama project serves as a prime example of this gray area. LLama is often cited as “open source” for its accessible code and model weights. However, it stops short of using an Open Source Initiative (OSI) approved license, which is a benchmark for true open-source projects. This distinction is important. OSI-approved licenses ensure that software can be freely used, modified, and shared under defined terms that maintain the spirit of open source. So Llama is open weights at best, and considering the license even that can be debated. This is more akin to source-available software, a concept where while the source code (and in this case the weights) are available, but can’t be used freely and the maintainers don’t accept contributions.

The absence of an OSI-approved license in projects like LLama raises questions about the use of the term “open source” in the AI domain. It underscores the need for clear, consistent definitions within the community to ensure that when something is labeled as open source, it fully adheres to the principles that the term represents.

The conversation around open-source AI, including “open weights” and projects like LLama, is a reminder of the broader challenges we face in making AI development more transparent, ethical, and inclusive. One of the big benefits of fully open source AI is that we can check these aspects in detail. Without the training data you can’t verify what the makers of an AI say. While the goal of fully open-source AI remains, the journey there is filled with nuances that require careful navigation.

As the tech community continues to explore these challenges, it’s essential that we maintain a dialogue about what truly constitutes open source in the context of AI. By doing so, we can work towards a future where AI technologies are developed in ways that are beneficial and accessible to all, reflecting the true spirit of open source.