LLM Terminology
Essential terms and concepts for working with AI language models
LLM (Large Language Model)
An AI model trained on vast amounts of text data to understand and generate human-like text. Examples include GPT-4, Claude, and LLaMA.
Prompt
The input text you provide to the AI model. A well-crafted prompt is key to getting good responses.
Context Window
The maximum amount of text (measured in tokens) that a model can process at once. Larger context windows allow for longer conversations.
Temperature
A parameter controlling response randomness. Lower values produce focused outputs, higher values generate creative responses. See our Temperature guide for details.
System Instructions
Special instructions that set the AI's behavior and personality throughout the conversation. See our System Instructions guide.
Hallucination
When an AI model generates plausible-sounding but incorrect information. Always verify critical information.
Fine-tuning
Training an existing model on specific data to specialize it for particular tasks or domains.
Embeddings
Numerical representations of text that capture semantic meaning. Used for semantic search and recommendations.
Few-shot Learning
Providing examples to help the model understand a task. Zero-shot (no examples), one-shot (one example), or few-shot (multiple examples).
Chain of Thought (CoT)
A prompting technique where you ask the model to show its reasoning step-by-step, improving accuracy for complex problems.
RAG (Retrieval-Augmented Generation)
Combines information retrieval with text generation. The model searches for relevant information before generating a response.
Model Parameters
The internal values that define an AI model's behavior. Expressed in billions (B). More parameters generally mean more capability.
Top-p (Nucleus Sampling)
Controls response diversity by limiting the model to choose from the top probability mass of tokens. Works alongside temperature.
Max Tokens
The maximum number of tokens the model will generate in its response. Prevents excessively long outputs. See our Tokens guide.
Tokenizer
The tool that breaks text into tokens for the model to process. Different models use different tokenizers.
Streaming
A mode where the AI sends its response token by token as it generates them, making interactions feel more natural.
Multimodal
AI models that can process multiple types of data like text, images, audio, and video. NyxoChat supports PDFs, images, and code files.
Latency
The time delay between sending a prompt and receiving the first token of the response. Lower latency means faster responses.