Home Page - OpenAIList.org

The Open Source
AI/ML Project List

Open Source is extending its democratizing power to AI. Innovations happen faster in open source, but finding the multitude of projects that can accelerate your AI education and projects can be difficult to uncover.

The Open Source AI List Project is attempting to collaboratively collate and maintain a list of open source AI project in one place.

The Projects

Type	Project	Model	Description	Web	GitHub	hf:tax:ai-type	hf:tax:model
Model	Tesseract OCR	OCR	Tesseract is an optical character recognition (OCR) engine originally developed by Hewlett Packard as a proprietary technology in the 1980s. It is commonly known as one of the most accurate OCR engines available and was launched as an open source AI software with sponsorship from Google in 2006. Its primary implementation is meant for unstructured data processing and text from image extraction, executed entirely from a common line interface.			model	ocr
Framework	TensorFlow	N/A	In the world of open source AI software, Google’s TensorFlow needs no introduction. It started as an internal project by the Google Brain Team in 2011, based on deep learning neural networks. As the company began using the technology in various ways, it decided to take TensorFlow in the open source direction from 2015. Today, several of the popular open source AI frameworks in the market are built on TensorFlow, which enjoys an active global community and widespread learning resources.	www.tensorflow.org/		framework	n-a
	Rasa Open Source		Rasa is among the most popular open source AI software used to build conversational interfaces. While the company mainly drives monetization from its enterprise product, it also has a powerful open source edition and a separate toolset for enhancing AI assistance. You can use Rasa to build custom ML models or leverage its pre-built library of models written in TensorFlow. Rasa Enterprise bolts on to the open source platform, bringing SSO-based security, service level agreements, and dedicated support.	rasa.com/	github.com/RasaHQ
Framework	PyTorch	N/A	PyTorch improves upon the foundational torch framework for ML that uses the Lua programming language. Facebook’s AI research lab launched PyTorch as a Python-based interface for AI/ML app development under an open source license in 2016. There’s a C++ interface for PyTorch available as well. Today, PyTorch has developed into a rich ecosystem that gives you all the tools necessary for accelerating AI development from research to production.	pytorch.org/	github.com/pytorch/pytorch	framework	n-a
	PaLM		PaLM, yet another language model developed by Google is trained on 540 billion parameters. This has evolved into a dense decoder-only transformer model trained with the Pathways system. It is the first one that used the Pathways system to train large-scale models with 6144 chips, along with being the largest TPU-based configuration. What makes PaLM stand apart from the rest is the fact that the model outperformed 28 out of 29 NLP tasks in English when compared to other models.
	OPT		Open Pretrained Transformer (OPT) is a language model, yet another leading GPT-3 alternative with 175 billion parameters. OPT is trained on openly available datasets allowing more community engagement. The release comes with the pretrained models along with code for training. The model is currently under noncommercial licence and available for research use only. The model was trained and deployed using 16 NVIDIA V100 GPUs, which is significantly lower than other models.
	OpenNN		OpenNN is an open source AI software library for implementing neural networks and ML. Its primary use cases include customer intelligence and industry-specific analytics, including their predictive applications. The company developing and maintaining OpenNN is called Artelnics, known for its pathbreaking AI and big data research. Importantly, OpenNN does not specialize in computer vision or natural language processing, unlike some of the other open source software on this list.	www.opennn.net/	github.com/Artelnics/OpenNN
	OpenCV		Open Source Computer Vision Library or OpenCV is a rich library of AI algorithms intended to address real-time computer vision functionalities. It was launched in the early days of AI development as part of an Intel research project back in 1999. In 2012, it was taken over by a non-profit foundation, which now runs the community, user support, and developer assistance. In 2020, the OpenCV AI Kit campaign was launched to collect funds for new hardware modules.	opencv.org/	github.com/opencv/opencv
	Mycroft.ai		Mycroft is the world’s leading open source voice assistant. It is private by default and completely customizable.The Mycroft open source voice stack can be freely remixed, extended, and deployed anywhere. Mycroft may be used in anything from a science project to a global enterprise environment.	mycroft.ai/	github.com/MycroftAI
	Megatron-Turing Natural Language Generation (NLG)		Collaboration also seems to have worked wonders for the GPT-3 domain. One such collaboration is that of NVIDIA and Microsoft. This collaboration resulted in the creation of the largest language models with 530 billion parameters. The model was trained on the NVIDIA DGX SuperPOD-based Selene supercomputer and is one of the most powerful English language models.
	LaMDA		Google came up with LaMDA, a model with 137 billion parameters, that has set a revolution in the natural language processing world. It was built by fine-tuning a group of Transformer-based neural language models. As far as the model’s pre-training is concerned, the team created a dataset of 1.5 trillion words which is 40 times more than previously developed models. LaMDA has already been used for zero-shot learning, program synthesis, and BIG-bench workshop.
Model	GPT-4	LLM - Large Language Model	GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses	openai.com/product/gpt-4		model	llm-large-language-model
	Gopher		Yet another DeepMind innovation is Gopher with 280 billion parameters. This model has an expertise in answering science and humanities questions much better than other languages. There’s more to it – DeepMind claims that Gopher can beat language models 25 times its size, and compete with logical reasoning problems with GPT-3. Well, this is definitely something to look forward to. Agree?
	GLaM		Yet another Google invention that deserves special mention is GLaM. It is a mixture of experts (MoE) model, which means it consists of different submodels specializing in different inputs. In addition to all this, it is also one of the largest available models with 1.2 trillion parameters across 64 experts per MoE layer. During inference, the model only activates 97 billion parameters per token prediction.
Model	Dino-Vitb16	Vision Transformer	Vision Transformer (ViT) model trained using the DINO method. It was introduced in the paper Emerging Properties in Self-Supervised Vision Transformers by Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin and first released in this repository. The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-1k, at a resolution of 224×224 pixels.	huggingface.co/facebook/dino-vitb16	huggingface.co/facebook/dino-vitb16	model	vision-transformer
	ClearML		ClearML is the result of the recent rebranding of Allegro AI, a provider of open source tools for data scientists and machine learning labs. Along with the rebranding, ClearML announced a free hosted plan to give data scientists the freedom to manage AI/ML experiments and orchestrate workloads without investing in additional resources. ClearML can be leveraged as an MLOps solution, ready for implementation via just two lines of code.	clear.ml/	github.com/allegroai/clearml
	Chinchilla		A model developed by DeepMind, and touted as the GPT-3 killer, Chinchilla is just the model you need. It is built on 70 billion parameters but with four times more data. Now, there’s an interesting fact to note about this GPT-3 alternative – this model outperformed Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on several downstream evaluation tasks. Can it get any better? Additionally, it requires very less computing power for fine tuning and inference.
	Bloom		Developed by a group of over 1,000 AI researchers, Bloom is an open-source multilingual language model that is considered as the best alternative to GPT-3. It is trained on 176 billion parameters, which is a billion more than GPT-3 and required 384 graphics cards for training, each having a memory of more than 80 gigabytes.
Model	BERT	LLM - Large Language Model	Google is to be appreciated for coming up with a neural network-based technique for NLP pre-training. Outcome? Well, BERT it is. It stands for Bidirectional Encoder Representations from Transformers. This GPT-3 alternative has two versions–Bert Base that uses 12 layers of transformers of transformers block and 110 million trainable parameters while Bert Large that uses 24 layers and 340 million trainable parameters.			model	llm-large-language-model
	AlexaTM		Amazon is nowhere behind when it comes to exploring technology. On the same lines, it has unveiled its large language model with 20 billion parameters – AlexaTM. Alexa Teacher Models (AlexaTM 20B) is a seq-2-seq language model with SOTA capabilities for few-shot learning. What makes it different from others is that it has an encoder and decoder for increasing performance on machine translation.

OpenAIList.org

The Open Source AI/ML Project List

The Projects

The Open Source
AI/ML Project List