LLM Terminology

Essential terms and concepts for working with AI language models

LLM (Large Language Model)

An AI model trained on vast amounts of text data to understand and generate human-like text. Examples include GPT-4, Claude, and LLaMA.

Prompt

The input text you provide to the AI model. A well-crafted prompt is key to getting good responses.

Context Window

The maximum amount of text (measured in tokens) that a model can process at once. Larger context windows allow for longer conversations.

Temperature

A parameter controlling response randomness. Lower values produce focused outputs, higher values generate creative responses. See our Temperature guide for details.

System Instructions

Special instructions that set the AI's behavior and personality throughout the conversation. See our System Instructions guide.

Hallucination

When an AI model generates plausible-sounding but incorrect information. Always verify critical information.

Fine-tuning

Training an existing model on specific data to specialize it for particular tasks or domains.

Embeddings

Numerical representations of text that capture semantic meaning. Used for semantic search and recommendations.

Few-shot Learning

Providing examples to help the model understand a task. Zero-shot (no examples), one-shot (one example), or few-shot (multiple examples).

Chain of Thought (CoT)

A prompting technique where you ask the model to show its reasoning step-by-step, improving accuracy for complex problems.

RAG (Retrieval-Augmented Generation)

Combines information retrieval with text generation. The model searches for relevant information before generating a response.

Model Parameters

The internal values that define an AI model's behavior. Expressed in billions (B). More parameters generally mean more capability.

Top-p (Nucleus Sampling)

Controls response diversity by limiting the model to choose from the top probability mass of tokens. Works alongside temperature.

Max Tokens

The maximum number of tokens the model will generate in its response. Prevents excessively long outputs. See our Tokens guide.

Tokenizer

The tool that breaks text into tokens for the model to process. Different models use different tokenizers.

Streaming

A mode where the AI sends its response token by token as it generates them, making interactions feel more natural.

Multimodal

AI models that can process multiple types of data like text, images, audio, and video. NyxoChat supports PDFs, images, and code files.

Latency

The time delay between sending a prompt and receiving the first token of the response. Lower latency means faster responses.

Tokens & Pricing Back to Documentation