Introduction to Large Language Models

The emergence of Large Language Models (LLMs) such as OpenAi’s ChatGPT and Google’s Gemini has raised a lot of questions. What is Artificial Intelligence? Is it dangerous? Will AI replace my job? What does the future hold for AI? Will AI solve all our problems? Is AI sentient? Let’s take a deeper look into the inner workings of Large Language Models to answer some of these questions.

What is AI?

As pointed out in one humorous tweet: “If it is written in Python, it’s probably machine learning. If it is written in PowerPoint, it’s probably AI”

Technically, Artificial Intelligence (AI) is a broad term that covers all kinds of systems and technologies that perform tasks that look smart from a human point of view. Machine Learning and Large Language Models are both types of AI but with different implementations and use cases. In practice, terms like AI and LLMs are often used interchangeably, with AI having slightly more marketing oriented connotation as the general public have more of an awareness of what AI is, whereas terms like LLM and Machine Learning are less well known.

Machine learning is a technique where machines are said to be “learning” from the existing data. It is done by employing different mathematical methods to minimize the difference between computer prediction and a ground truth. For example, you may have a training dataset of pictures, some containing pictures of cats, and others containing pictures of dogs, each picture also being labeled by a human as to whether it is of a dog or cat. The algorithm is trained using this data and then subsequently can be shown an unlabelled picture not from the original training dataset and it will be able to identify if the image is of a cat or a dog. This type of learning is called supervised machine learning.

Deep learning is a type of machine learning which employs artificial neural networks, and currently considered to be state of the art technique. Deep learning can train itself from its own environment and its previous mistakes. For many tasks deep neural networks deliver the best results compared to other machine learning techniques but require much larger amounts of data and take longer to train itself than supervised machine learning.

What is a Large Language Model?

A Large Language Model is based on the same principle as word suggestions on your smartphone. At every step, the job of the LLM is to predict the next word in a sentence. After predicting one word, the process is repeated again: the original prompt with the predicted next word is fed back into LLM to do the next word prediction.

But how is it possible that by just predicting the next word AI is able to write essays and answer questions? Ilya Sutskever, Chief Scientist at OpenAI, conceptually explained it in one of his interviews. Imagine you’re reading a detective story, and on the last page of the book, there is a sentence: “And the name of the killer is ___”. In order to make a good prediction of this next word, you really need to take into account all the information contained in the book. Thus, by better and better predicting the next word an LLM essentially compresses the information about the World into some model.

This is a lossy compression (that is, some of the information in the text it analyses is lost when the model is created). Otherwise, it would require memorizing the whole Internet, and would take significantly more computation and lead to longer prediction times. Due to this lossy nature of compression, LLMs are known to completely make up thing facts that do not exist in the real world, or just provide wrong answers from time to time. This phenomenon is known as hallucinations. It hasn’t been fully solved in any of the modern LLMs.

How was AI created?

The progress in AI was achieved through decades of research, going back to the 1960s and 1970s. But it only became possible when computer hardware evolved to a stage of being quite powerful and yet relatively cheap. Even today, the major bottleneck in Large Language Models progress is the limited supply of GPUs (graphics processing units). GPUs were originally designed for rendering graphics for video games, but their design also happens to be perfect for the type of operations that AI relies upon, so they are now heavily used in this area,

In machine learning, practitioners usually try to find a balance between a model memorizing the data, and generalizing well for unseen data. Sometimes a model just memorizes the data that it saw in the past, but performs poorly on the test or unseen data, this is called overfitting. On the other hand, when there is not enough data, a model doesn’t learn all the essential underlying characteristics, this is called underfitting. It was observed that with deep neural networks, when the amount of data available increases, but also the number of network parameters or neurons increases, the model doesn’t reach this overfitting stage, and performs well in general.

It was hypothesized that the more parameters the model has, the better it will perform. This direction of development was taken by OpenAI with its famous GPT models. With every new model generation you can see a significant increase of the number of model parameters. It was also observed that, with an increasing number of parameters, models often obtain new capabilities such as arithmetic operations, basic math, language translation, etc. On the other hand, latest LLMs already processed most of the publicly available Internet. In order to feed in more data, researchers have turned to other sources of information such as videos, podcasts, pdfs, sensor data etc.

Another breakthrough achieved by OpenAI, which led to the creation of ChatGPT, is adding the RLHF stage after the general model training. RLHF stands for Reinforcement learning from human feedback. The base language model is just some text essence or statistics extracted from the language. In order for it to perform useful tasks, and produce valuable output, you will need to provide many examples of “good responses” for particular prompts. During this stage many human raters rate the answers of the LLM to different types of prompts, thus giving it a sense of what a human would expect to be a good and appropriate answer.

LLM Security

Although all LLMs go through a so-called alignment stage, and should not provide answers to sensitive questions, people constantly come up with new ways to jailbreak LLMs or circumvent their security guardrails. In one famous example, a person was able to get a recipe for napalm from LLM by telling it an imaginary story about their grandmother.

The more complex method of jailbreaking the LLM is to feed it with a prompt containing a specific “gibberish” suffix. The researchers claimed they can always find this special type of a suffix to jailbreak a model for any prompt. In general, LLMs are very susceptible to these types of attacks.

AI Risks for Society

The group of activists from the Center for Humane Technology in their The AI Dilemma documentary warn that widespread adoption of AI could have very detrimental effects on society. The authors explore the negative effects that social media can have on the lives of ordinary people, and conclude that Artificial Intelligence, being a more powerful technology, could lead to more dangerous consequences such as automated fake news creation, automated hacking, synthetic relationships etc. Many people agree that, despite many benefits, AI also potentially poses some great risks and should be regulated by governments worldwide.

Is AI sentient?

AI is not sentient, and doesn’t have consciousness. It is a statistical model created through the use of mathematical methods, primarily calculus, from a large number of observations or examples, such as text, images, video etc. In our opinion, the concept of being sentient can only currently be applied to biological life forms. If you prefer a simpler analogy, AI could be thought of as a very complex mathematical equation. It doesn’t exhibit properties specific to animals and humans. However, AI is known as being able to pass a Turing test. A Turing test is a form of test where a human evaluator tries to identify if they are talking/chatting with a machine or with a real person. Modern LLMs are complex enough to make an impression of a real person participating in a discussion and convince the evaluator they are human.


Edd Dawson

Founder KeywordsPeopleUse