How the open-source world is wrestling with security and licensing issues for generative AI
The rise of generative artificial intelligence has been accompanied by a growing debate within the open-source community: Are OpenAI and other model providers really open, and how trustworthy is the technology?
This debate has become more relevant as AI adoption becomes more entrenched in the engines running critical systems today. Transparency has been a hallmark of the open-source ethos, yet major questions surround how transparent many of the most widely used AI models truly are.
A report released last fall by the Center for Research on Foundation Models within Stanford University’s Human-Centered Artificial Intelligence department found that transparency, as measured by how a model is built, how it works and how it is used downstream, lagged for the 10 largest model providers. Using an index based on a 100-point scale, the highest transparency topped out at a mere 54% for Meta Platforms Inc.’s Llama 2, down to 12% for Amazon.com Inc.’s Titan Text. OpenAI’s popular GPT-4 model ranked third at 48%.
“In AI, we are all sort of collectively trying to figure out what openness means,” Jim Zemlin, executive director of the Linux Foundation, said in remarks during the Open Source Summit in April. “In large language models, this is where the definition of openness gets a little tricky.”
Engaging the open-source community to craft AI strategy
During the March gathering of the KubeCon + CloudNativeCon conference in Paris, the Cloud Native Computing Foundation released its Artificial Intelligence whitepaper. The report from the CNCF’s AI Working Group noted an “imperative to clearly define who owns and has access to the data” throughout the AI lifecycle.
SiliconANGLE spoke with Ricardo Aravena, head of engineering at Truera Inc. and a contributor to the whitepaper for CNCF, during the KubeCon EU event.
“We don’t have a set definition of what it means to be a transparent model,” Aravena said. “There’s a lot of randomness around this. That part is difficult. We’re engaging the community to get involved and solve some of the challenges.”
Those challenges involve the creation of open models that allow developers to build upon and adapt previous work and craft a generative AI strategy. This would normally include an ability to replicate training data and training code, something that has not always been readily available.
“That’s the piece that is truly not very open,” Erik Erlandson, senior principal software engineer at Red Hat Inc., said in an interview. “You need to read very carefully the licensing on every model. A central topic of discussion is defining standards for open generative AI models.”
Efforts are underway to define those standards through a host of industry and community projects. Both the Linux Foundation and CNCF have initiatives focused on this task. IBM Corp., Intel Corp., Meta, Advanced Microsoft Devices Inc., Oracle Corp., Red Hat and Databricks Inc. are among the companies collaborating in the AI Alliance, a group committed to “developing AI collaboratively, transparently and with a focus on safety, ethics and the greater good” according to the organization’s website.
The issue, as Erlandson noted, is that when businesses spend a significant amount of capital to build models, they’re not eager to donate them to the open-source ecosystem without some way to recoup the investment.
“Companies spent a lot of money making them,” Erlandson said. “If you’re spending that amount of money creating them, I don’t blame them for not giving it away.”
Leveraging Red Hat Ansible to build trust
Red Hat has been involved in an initiative to navigate the legal complexities in AI through an effort to proactively engage the developer community while minimizing legal and licensing disputes and fostering trust in the open-source ecosystem.
Red Hat’s Ansible automation platform has proven to be a useful resource for gaining better clarity into the state of model licensing. In November, the company announced general availability of Ansible Lightspeed with IBM’s watsonx Code Assistant as an automated generative AI service based on a training base repository of shared Ansible code.
“In the Ansible community, we try to pay attention to the licenses that are part of the Ansible gallery,” Chris Wright, chief technology officer and senior vice president of global engineering at Red Hat, told SiliconANGLE. “That has become an interesting source of training material. We thought we could create a better outcome when we paid attention to the licenses that are used to train a model.”
One of the key areas that has been a focus for Wright and Red Hat is trust. In the software community, the supply chain and provenance are critical when it comes to ensuring a secure experience.
Last year Red Hat began publishing software bill of materials or SBOM files for the firm’s core offerings. An open-source project, TrustyAI, was developed by Red Hat and contributed to the community in 2021 as a business automation tool for AI explainability, tracing and accountability. It’s a cloud-native solution built with Kognito and OpenShift that can be deployed to any system, running in any environment. As AI use continues to grow, this could become an avenue for trusting where a particular model came from.
“One aspect of trust that we see already today is: ‘Where did that code come from?’” Wright noted. “There are some really interest challenges in this space in terms of building trust.”
Threats to the Linux kernel
Challenges associated with security and trust in the open-source ecosystem became more apparent in late March when it was revealed that a Microsoft developer had spotted malicious script that had been placed in XZ Utils, data compression software commonly used in Linux distributions and Unix-like operating systems. The potential backdoor code had apparently been inserted by a lone developer whose true identity remains unverified.
The near-miss sent a chill through the enterprise security world because Linux is one of the foundational technologies for running the world’s networks. The Linux kernel provides a key interface between a computer’s hardware and its resources.
“Clearly it’s a wakeup call,” said Linus Torvalds, Linux Foundation Fellow and creator of Linux, in remarks during the Open Source Summit in April. “There are a lot of people looking into various measures of trust in the kernel. You trust the people around you to do the right thing. That trust can be violated. How to figure out when it’s been violated is an open problem.”
The Linux backdoor insertion highlights the conundrum facing the open-source world as new waves of code, some of which are AI-generated, flow into the ecosystem. In the spirit of an open-source community, it’s difficult to get a complete picture of every contributor writing every piece of software and be able to verify security.
“The three key questions are: What is the world’s most critical software, who writes it, and is it secure and healthy?” the Linux Foundation’s Zemlin said during the Open Source Summit. “Fortunately, this was actually caught. The system worked in discovering the vulnerability, but the system wobbled a little bit in that this identity got a commit in an important project.”
In the rush to capitalize on the promise of AI, it’s easy to lose sight of the fact that it was all created by humans, who are not going away anytime soon. The industry leaders engaging with enterprises and the open-source community all noted in presentations and interviews for this story that this still remains a key element. In the end, open-source practitioners believe a solution will be found, through a delicate dance between the human intellect and what it has endeavored to create.
“We’re connecting the human processes to AI processes,” said Red Hat’s Wright. “I like to think of this as machine augmented human intelligence. There’s just a great dialogue and back and forth.”
Image: SiliconANGLE/Ideogram
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU