wordpress blog stats
Connect with us

Hi, what are you looking for?

Navigating the Complexities of Open Source AI: Insights from Carnegie India Summit

With companies like Meta marketing their AI models as open source but with restrictions on how it can be used by the users, more clarity is needed on what “open-sourcing” means against the backdrop of AI.

“Most economists will agree that open source software is a digital public good because it meets the definition, the economist definition of what is a public good,” Public Policy Director at Meta Sunil Abraham said during Carnegie India’s Global Technology Summit that took place in Delhi between 4-6 December. Abraham said this while addressing the debate between open-sourced and closed-source artificial intelligence models.

Interestingly, Meta’s large language model (LLM, a type of artificial intelligence program) LLaMA 2 which was launched in July this year is open-sourced. Abraham said that even outside of LLaMA 2, Meta’s founder Mark Zuckerberg has been committed to using open-source software from the very beginning. “And in that way, corporations like Meta can contribute to India’s vision around digital public infrastructure by providing the building blocks for that digital public infrastructure, which is open source software or digital public goods,” he explained. He mentioned that the company has 1200 open-source projects.

Wait, what does open source mean?

The term open source refers to something people can modify and share because its design is publicly accessible. The term owes its origins to software development and is used to describe software with source code that anyone can inspect, modify, and enhance. But it doesn’t have such a clear definition when it comes to artificial intelligence (AI). “There is no agreed-upon open source definition for AI so that’s something we need to acknowledge. I would say, and others say this as well, that LLaMA for instance wouldn’t qualify as open source under the open source software definition because they use restrictions baked into the license,” Lea Gimpel from the Digital Public Goods Alliance explained.

What Gimpel meant was that different companies have different interpretations of what open-sourcing means for AI. For instance, OpenAI’s ChatGPT is open access in that anyone can use the application programming interface (API) to build applications. But neither the datasets used to train the model nor the codebase are public. OpenAI also uses different definitions of what open source means for its different projects and has in some cases. According to Gimpel, the fact that LLaMA 2 has restrictions on how you can use it baked into its license means that it wouldn’t qualify as open-source software.

How open source AI can be defined:

“There are currently several work streams and ways that are trying to define open source AI with the community to better understand what would we actually need to open source in order to maintain the benefits that we see in open source software,” Gimpel said. She explained that the factors being considered in the definition are transparency, the ability to modify a system, and reproducibility.

Gimpel explained that from the Digital Public Goods Alliance’s perspective, the focus on open-source AI discussions should be on transparency and accountability, as opposed to reproducibility. She argued that to enable others to build on top of an existing AI model, you don’t need to give them access to fully trained datasets. “What you probably need [to give] is a good understanding of what the data is that the model was trained on,” she explained adding that this could mean a snapshot of what is representative of the data used for training. “I think what we really need to solve for is what is responsible openness in that sense. Where can you make components openly accessible and do this in a responsible way? And where should you probably also keep it closed,” Gimpel said.

Advertisement. Scroll to continue reading.

Intersections between open source and transparency:

“If there is a regulated entity, a corporation, an academic organization that makes a very specific safety claim about the model that they are releasing, then the only way that we can conclude with scientific confidence that that claim is correct is if we embrace the whole open science paradigm and not just open source,” Abraham argued. He said that for independent researchers to ensure the safety of an AI model, its source code and training data sets would need to be publicly available. He also mentioned that the model would have to be released along with a scholarly paper. “And only then can you have full reproducibility in the tradition of science and independently people can confirm a particular safety claim,” Abraham explained.

Speaking about Meta’s perspective on open sourcing, Abhrahm said that the company is “non-dogmatic about open source. That means we do have many proprietary products as well. ” He explained that when Meta develops an AI model it asks the question— what risks do these models pose and should this model be openly released or should it remain a proprietary model? “A good example of this is VoiceBox. VoiceBox is a model that can be trained on a three-second sample of anybody’s voice. And after that training is done, VoiceBox can artificially synthesize that person’s speech, replicating accent, etc. So that model, because of the deepfake risks, we decided not to make it open source,” he shared. He explained that the company has “limited, not full source code transparency, but transparency about how much compute was used, how much carbon was burnt, what data sets were used to train the model, etc.”

Is open source always the best choice for AI companies?

Gimpel said that data and privacy need to be considered when deciding to open-source an AI model. She explained that developers need to make sure that their AI models do not leak personally identifiable information (PII) before open-sourcing them. “Sadly, we currently don’t have any open sourcing standards of when is it to be safe to open source the model. I think that the industry needs to develop that,” she said.

The other part that needs to be considered is the misuse of the AI model downstream, that is, by those building on top of an AI model. “In the open source software world we have seen that that’s not achievable, you can’t monitor the downstream use of an open source project,” she said. She said that when considering the same for AI, it’s important to identify bottlenecks that everyone working with AI goes through and use those as a means of safeguarding against misuse.

For instance, anyone developing AI needs to go through cloud infrastructure. “One idea could be, for instance, to put the liability on cloud providers to make sure that the model is not misused and they will develop measures accordingly to account for them,” she suggested. She also suggested that industries could be asked to obtain a license to operate a specific model. “I guess I would get a lot of bashing from the open source community for saying this because that’s clearly not falling under the open source definition anymore at least not in the ideological sense. But it would help to account for these risks and at the same time ensuring that others can build on a model and innovate on top of it,” she shared.

STAY ON TOP OF TECH NEWS: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!

Advertisement. Scroll to continue reading.

Also read:

Written By

Free Reads


The service from the tie-up will initally be launched at Bengaluru, Bhubaneswar, Vijayawada and Visakhapatnam railway stations


The Minister's response came after an X user posted answers generated by Gemini regarding Prime Minister Narendra Modi.


Vaishnaw said that in the next five years, there will be significant disruptions in the way telecom technology operates.

MediaNama’s mission is to help build a digital ecosystem which is open, fair, global and competitive.



Notably, Indus Appstore will allow app developers to use third-party billing systems for in-app billing without having to pay any commission to Indus, a...


The existing commission-based model, which companies like Uber and Ola have used for a long time and still stick to, has received criticism from...


Factors like Indus not charging developers any commission for in-app payments and antitrust orders issued by India's competition regulator against Google could contribute to...


Is open-sourcing of AI, and the use cases that come with it, a good starting point to discuss the responsibility and liability of AI?...


RBI Deputy Governor Rabi Shankar called for self-regulation in the fintech sector, but here's why we disagree with his stance.

You May Also Like


Google has released a Google Travel Trends Report which states that branded budget hotel search queries grew 179% year over year (YOY) in India, in...


135 job openings in over 60 companies are listed at our free Digital and Mobile Job Board: If you’re looking for a job, or...


By Aroon Deep and Aditya Chunduru You’re reading it here first: Twitter has complied with government requests to censor 52 tweets that mostly criticised...


Rajesh Kumar* doesn’t have many enemies in life. But, Uber, for which he drives a cab everyday, is starting to look like one, he...

MediaNama is the premier source of information and analysis on Technology Policy in India. More about MediaNama, and contact information, here.

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ

Subscribe to our daily newsletter
Your email address:*
Please enter all required fields Click to hide
Correct invalid entries Click to hide

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ