Member-only story
A Basic Understanding of the ChatGPT Model
A technical understanding of the ChatGPT from OpenAI, the most talked about AI chatbot of 2022.
What is ChatGPT?
ChatGPT is an Artificial Intelligence software developed by OpenAI where people can ask questions, and the software will answer their questions based on context and relevance.

To use ChatGPT, you can type a question here; ChatGPT will read the questions and generate appropriate responses in a conversational setting, allowing you to learn new things or have fun conversations. It can write essays on simple to complex topics.
ChatGPT is a variant of the GPT (Generative Pretrained Transformer) language model developed by OpenAI, released on Nov 30, 2022. It is based on GPT 3.5, which is a Transformer based architecture.
What is the dataset used for training ChatGPT?
Like other language models, ChatGPT has been trained on large and diverse data sources, such as news articles, books, websites, and social media posts, to learn the patterns and structures of language.
How are ChatGPT models trained?
ChatGPT is a Transformer based architecture that follows an encoder-decoder structure with self-attention based on Attention is All You Need by Vaswani et al.

All the words of the input sequence are fed to the Transformer and flow simultaneously through the Encoder and Decoder stack.
The Transformer consists of a series of self-attention layers, which process the input data and generate output representations capturing the meaning and context of the input data. The self-attention layers calculate the similarity between each word and all other words in the sentence and use this similarity to weigh the importance of each word.
ChatGPT uses masked self-attention to mask out certain words or phrases…