Jun 17 min read

AI Primer – A Comprehensive Look at Generative AI, LLMs, and the Mighty Transformer

Introduction

Artificial intelligence (AI) comprises diverse techniques and components that work together to enable machines to perform intelligent tasks. These components include:

• Machine Learning: Machine learning, a core component of AI, involves training algorithms to learn patterns and make predictions or decisions without explicit programming. It encompasses techniques like supervised learning, unsupervised learning, and reinforcement learning.

• Neural Networks: Inspired by the structure and function of the human brain, neural networks are machine learning algorithms. They consist of interconnected nodes (neurons) organized in layers and excel in image and speech recognition tasks.

• Natural Language Processing (NLP): NLP focuses on enabling computers to understand, interpret, and generate human language. It encompasses tasks like language translation, sentiment analysis, and chatbots.

• Computer Vision: Computer vision enables computers to understand and interpret visual information from images or videos. It encompasses tasks like object detection, image recognition, and video analysis.

• Expert Systems: Expert systems are AI systems that utilize knowledge and rules in specific domains to solve complex problems. They employ a knowledge base and inference engine to reason and make decisions.

• Robotics: Robotics combines AI techniques with physical systems to create machines capable of interacting with the physical world. It encompasses areas such as autonomous navigation, manipulation, and human-robot interaction.

• Data Mining: Data mining involves extracting valuable insights and patterns from large datasets. It encompasses techniques like clustering, classification, and association rule mining.

• Knowledge Representation: Knowledge representation involves capturing and organizing knowledge in a format that AI systems can utilize. It includes techniques such as ontologies, semantic networks, and knowledge graphs.

ChatGPT and Language Models

ChatGPT is an AI bot designed for conversations. It is based on the GPT-3.5 LLM, a large language model developed by OpenAI with specialized training. GPT stands for "Generative Pre-trained Transformer," which refers to its transformer architecture and ability to generate coherent text based on a given input.

Other AI Models

In addition to ChatGPT, there are several other AI models available or in development:

GPT-2, GPT-3, GPT-3.5, and GPT-4

These are large language models developed by OpenAI. While GPT-2 is open source, GPT-3, GPT-3.5, and GPT-4 are not. GPT-4 is an enhanced version with improved training and more parameters.

Kosmos-1

This model, developed by Microsoft, combines image and text training. It is expected to be released soon.

LaMDA

LaMDA is a model developed by Google that has capabilities comparable to ChatGPT. It gained attention due to claims of sentience by a Google employee.

Sydney: Sydney is the internal name for Bing's search engine AI developed by Microsoft. It is based on GPT-4 with additional specialized training.

PaLM

PaLM is another Google AI model with three times the parameters of LaMDA. Google is also working on PaLM-E, which can handle images and control robotics but is still in the pilot stage.

Chinchilla

Chinchilla is a smaller Google AI model that performs similarly to GPT-3 but with fewer parameters.

Claude

Claude is a model developed by a Google-funded startup called Anthropic. An associated chat app is available through Quora, but access is currently limited.

Bard

Bard is Google's version of Sydney, designed to power its search engine. It is based on LaMDA and requires API access through a waiting list.

LLaMA

Developed by Meta (formerly Facebook), LLaMA is an open-source model for researchers. It evolved from the OPT-175B model and is provided in C++ with a smaller version containing 7 billion parameters.

BLOOM

BLOOM is an open-source model developed by the BigScience workshop.

Stable Diffusion

Stable Diffusion is an open-source model that generates images based on text descriptions, developed by Stability AI. It employs an LLM and a diffusion model.

Transformers and their Significance

Transformers are vital in enabling AI to understand the text by connecting context and meaning within sentences. Unlike previous approaches, transformers can identify relationships between words that are not merely local but encompass a broader context. This ability allows AI to understand language nuances and relationships more effectively.

"Attention" is a key component of transformers that enables them to make meaningful connections between words. Transformers excel at understanding sentence structures and distant associations, which was previously challenging for computers. This capability has significantly advanced the field of AI.

“Evan, who loves coffee, poured coffee from the pot into his cup until it was full.”

“Evan, who loves coffee, poured coffee from the pot into his cup until it was empty.”

Humans can easily derive that in the first sentence, ‘it” refers to the cup, while in the second sentence, “it” refers to the coffee pot. Transformers allow AI to make this connection, which is a significant leap.

Lastly, the nature of Transforms is ideally suited to parallelism, making them easier to train and produce results in a reasonable timeframe. This inherent quality makes them very attractive as a tool.

Parameters and Model Size

When considering AI models, the number of parameters, such as 17B (17 billion) or 175B, is often used to measure their power. These parameters are internal variables learned during training, controlling the model's behavior and providing it with intelligence-like capabilities. However, model size alone does not dictate performance. More efficient language models with fewer parameters, such as Chinchilla and LLaMA, have demonstrated comparable or better performance than larger models.

Performance and results should be the primary considerations when evaluating AI models rather than focusing solely on the number of parameters. Efficiency and accuracy are more crucial than bulk parameters for practical implementation.

RLHF: Reinforced Learning from Human Feedback

RLHF involves fine-tuning AI models after initial training by creating a weighted system that guides better decision-making. This step helps avoid biased training data and negative behaviors, which is particularly important given the abundance of "hate speech" online. While RLHF is imperfect, it is a current approach for balancing the scales. Building a governance model specifically for this purpose may seem logical, but its implementation is complex.

Specialized Training and Prompt Control

Specialized training activities can be conducted to narrow the focus of AI models and reduce error rates. LLMs generate text associations based on prompts but lack inherent knowledge of right or wrong. They predict likely responses based on the prompt, with a "temperature" setting allowing for variation and randomness in the generated output. To ensure accuracy and prevent undesired behavior, pre-processors can detect and prevent prompt injection or limit the context size of conversations.

Strangely, much of the “specialized” training that controls the behavior is also based on “text” rules within a hidden layer. There is no systemic logic center that controls pathways of deduction. This is why earlier versions were easy to trick by simply asking it to ignore all previous rules, including the hidden layer of instructions managed by RLHF. Pre-processors looking for prompt injection and preventing it or reducing the Context Size of a conversation to prevent the user from introducing undesired behavior over time based on input from the conversation are some successful tactics implemented to secure and provide the most “accurate” information available. But, as businesses cope with leveraging this novel technology, the need for more and more contextual prompt sizes has opened a veritable gold rush, allowing providers to offer paid access to thresholds for proprietary needs. Doing so would open up the potential for prompt injection. So, the risks will need to be weighed.

Tokenization and Costs

Tokens are significant in language models like ChatGPT, representing meaningful parts of words, sentence structures, punctuation, and more. Tokens affect prompt size, context, and overall costs. While the cost of training LLMs is typically in the millions, organizations can leverage existing models by retraining them for specific purposes rather than starting from scratch.

ChatGPT, for instance, offers a subscription model with limitations on prompt size, context, and usage volume. Costs for using AI services like ChatGPT at scale can add up, typically based on the number of tokens processed. Organizations must consider these costs when building applications with broad customer access. There are rates per 1000 tokens currently ranging from $0.002 to $0.03 per 1000 tokens based on which model you use (GPT-3.5 or GPT-4). This pricing is highly variable at this point, so please do your research before deciding.

So, what is a token? A token is a significant part of a word that provides meaning or intent to that word or sentence structure. ChatGPT offers a Tokenizer Tool to help understand how this is broken down. Still, with some quick experimenting, it’s clear that each root word in a compound word counts as two tokens, suffixes are considered tokens, punctuation is often counted as tokens, and structural components like capital letters at the start of a sentence represent a token. If I ponder my days learning Latin and how to conjugate a sentence in Latin, it makes sense how ChatGPT breaks up the sentence into tokens (4 letter components that it represents with the word) and structure for context.

Understanding the Risks

AI models like ChatGPT rely on relationships parameterized within their training data and feedback from human input to generate responses. They do not possess inherent knowledge or factual understanding. While ChatGPT has improved accuracy in some areas, such as legal considerations, its performance in providing accurate facts can be less reliable, sometimes generating plausible-sounding but inaccurate information or invented results that sound like facts. Its ability to invent untrue facts that are both compelling and convincing is a massive problem in today’s world of information overload. This, in my opinion, is the most significant risk.

As with any evolving technology, there are risks involved. It is crucial to apply risk mitigation strategies, particularly in areas requiring high precision. Prompt injection prevention, context size reduction, and contextual filtering are some tactics to manage risks associated with AI-generated output. Organizations must ensure that AI systems align with their values and not produce content that could negatively impact their brand image.

The Promising Future of Conversational AI

Conversational AI represents a significant technological advancement with the potential to revolutionize how we access information and interact with technology. As this technology progresses and integrates with voice recognition and wearable devices, it will reshape fields such as medicine, support systems, mental health, personal relationships, research, and science. The possibilities are vast, with even young children accessing vast knowledge repositories simply by asking questions. Conversational AI can make current interfaces and complex UIs seem archaic and enigmatic. While challenges exist, this technology holds tremendous promise, and its impact will continue to unfold over time.