NEW Browse AI tools across categories — updated daily. See what's new →
Reference · 2026

AI glossary

133+ AI & ML terms defined in plain English. From LLMs and RAG to Transformers and Fine-tuning.

A 13 TERMS

A mathematical function applied to the output of a neural network node (neuron) that determines whether it should be activated. Common examples include ReLU, Sigmoid, and Tanh. Activation functions introduce non-linearity, enabling networks to learn complex patterns.

AI systems designed to autonomously plan, execute, and iterate on complex multi-step tasks with minimal human guidance. Agentic AI combines LLMs with tool use, memory, and planning capabilities to accomplish goals like software development, research, and business automation.

A hypothetical form of AI that possesses the ability to understand, learn, and apply knowledge across any intellectual task that a human can perform. Unlike narrow AI, AGI would exhibit flexible reasoning, creativity, and adaptability without task-specific training.

An autonomous software entity that perceives its environment, makes decisions, and takes actions to achieve specific goals. Modern AI agents use large language models to plan multi-step tasks, call tools, browse the web, and execute code with minimal human intervention.

The branch of ethics that examines the moral implications of artificial intelligence, including bias and fairness, privacy, accountability, transparency, and the societal impact of automation. AI ethics frameworks guide responsible development and deployment.

An interdisciplinary field ensuring AI systems operate reliably, securely, and beneficially. AI safety research covers robustness to adversarial attacks, interpretability, containment, and long-term existential risk from increasingly capable systems.

The process of labeling or tagging data (text, images, audio, video) with metadata that supervised machine learning models use for training. High-quality annotation is essential for model accuracy and is often performed by human annotators or semi-automated tools.

A technique in neural networks that allows a model to dynamically focus on relevant parts of the input data when producing output. Self-attention in Transformers computes weighted relationships between all positions in a sequence, enabling parallel processing and long-range dependency capture.

AI systems capable of operating independently with minimal human oversight. These systems perceive their environment, make decisions, and take actions to achieve goals—ranging from self-driving cars to AI coding agents that write, test, and deploy software autonomously.

B 4 TERMS

The primary algorithm for training neural networks. It computes gradients of the loss function with respect to each weight by propagating errors backward through the network layers, then updates weights using gradient descent to minimize prediction errors.

A standardized test or dataset used to evaluate and compare AI model performance. Common benchmarks include MMLU (knowledge), HumanEval (coding), GSM8K (math), and ImageNet (vision). Benchmarks drive model development but can be gamed, so real-world evaluation remains important.

Systematic errors or prejudices in AI model outputs caused by imbalanced training data, flawed assumptions, or societal biases embedded in datasets. AI bias can lead to discriminatory outcomes in hiring, lending, criminal justice, and healthcare applications.

C 10 TERMS

A supervised learning task where the model assigns input data to predefined categories. Examples include email spam detection, image recognition (cat vs. dog), sentiment analysis (positive/negative), and medical diagnosis (malignant/benign).

An unsupervised learning technique that groups similar data points together based on their features. Common algorithms include K-means, hierarchical clustering, and DBSCAN. Used in customer segmentation, image grouping, document organization, and anomaly detection.

The maximum amount of text (measured in tokens) that a language model can process in a single interaction. Larger context windows allow models to consider more information. GPT-4 supports 128K tokens, Claude 3.5 supports 200K tokens, and Gemini 1.5 Pro supports 1M+ tokens.

D 7 TERMS

Techniques for artificially expanding training datasets by creating modified versions of existing data. For images, this includes rotation, flipping, cropping, and color jittering. For text, it includes paraphrasing, back-translation, and synonym replacement.

The process of assigning meaningful tags, categories, or annotations to raw data so supervised machine learning models can learn from it. Accurate labeling is critical for model quality and is one of the most time-consuming and expensive parts of ML pipelines.

A subset of machine learning that uses neural networks with many layers (hence "deep") to learn hierarchical representations of data. Deep learning powers breakthroughs in image recognition, speech processing, natural language understanding, and generative AI.

AI-generated synthetic media (typically video or audio) that realistically depicts people saying or doing things they never actually did. Deepfakes use deep learning techniques like GANs and autoencoders, raising concerns about misinformation, fraud, and consent.

A class of generative AI models that create data (usually images) by learning to reverse a gradual noising process. Starting from pure noise, the model iteratively denoises to produce high-quality outputs. DALL-E 3, Stable Diffusion, and Midjourney use diffusion models.

In a Generative Adversarial Network (GAN), the discriminator is the model that tries to distinguish between real data and fake data produced by the generator. The adversarial training between generator and discriminator drives both to improve.

See Knowledge Distillation. The process of training a smaller, efficient model to replicate the behavior of a larger teacher model. Distilled models like DistilBERT and Gemma run faster and cheaper while retaining most of the original model's capabilities.

E 6 TERMS

The deployment of AI algorithms directly on local devices (smartphones, IoT sensors, cameras) rather than in the cloud. Edge AI reduces latency, enhances privacy, and enables offline inference. Examples include on-device voice assistants and real-time image recognition.

Capabilities that appear in large AI models at certain scale thresholds but are absent in smaller models. Examples include chain-of-thought reasoning, in-context learning, and code generation. Emergent abilities suggest that scaling up models can unlock qualitatively new behaviors.

A neural network architecture with two parts: an encoder that processes input into a compressed representation, and a decoder that generates output from that representation. Used in machine translation, text summarization, and image captioning.

One complete pass through the entire training dataset during model training. Multiple epochs are typically needed for a model to converge. Too few epochs cause underfitting; too many cause overfitting. Learning rate schedulers often adjust across epochs.

The degree to which AI model decisions can be understood and interpreted by humans. Explainable AI (XAI) techniques like SHAP values, attention visualization, and feature importance help build trust, satisfy regulations, and debug model behavior.

F 7 TERMS

In AI, fairness refers to ensuring models treat individuals and groups equitably regardless of protected attributes like race, gender, or age. Fairness metrics measure disparate impact, and bias mitigation techniques are applied during data collection, training, and post-processing.

The process of selecting, transforming, and creating input variables (features) from raw data to improve machine learning model performance. Good feature engineering requires domain expertise and can dramatically boost accuracy without changing the model architecture.

A machine learning approach where models are trained across multiple decentralized devices or servers holding local data, without exchanging raw data. Only model updates (gradients) are shared, preserving privacy. Used by Google Keyboard, Apple Siri, and healthcare applications.

The process of further training a pre-trained model on a smaller, task-specific dataset to adapt it for a particular use case. Fine-tuning adjusts the model's weights to improve performance on specific domains (legal, medical, coding) while leveraging general knowledge from pre-training.

G 8 TERMS

A framework consisting of two neural networks—a generator and a discriminator—trained in competition. The generator creates synthetic data while the discriminator evaluates authenticity. GANs produce realistic images, video, and audio, though they've been largely superseded by diffusion models for image generation.

In a GAN, the generator is the network that produces synthetic data (images, text, audio) from random noise. Its goal is to create outputs realistic enough to fool the discriminator. Generators learn to map from a latent space to the data distribution.

An optimization algorithm used to minimize the loss function during neural network training. It iteratively adjusts model weights in the direction that reduces error. Variants include stochastic gradient descent (SGD), Adam, and AdaGrad.

The technique of anchoring AI model outputs to verified, factual sources of information. Grounding helps reduce hallucinations by connecting the model's responses to search results, databases, or documents. Google's Gemini uses grounding with Google Search.

H 1 TERM

Configuration settings for model training that are set before learning begins (as opposed to parameters learned during training). Examples include learning rate, batch size, number of layers, and dropout rate. Hyperparameter tuning significantly impacts model performance.

I 5 TERMS

The creation of new images from text descriptions (text-to-image), sketches, or other images using AI models. Leading tools include DALL-E 3, Midjourney, Stable Diffusion, and Adobe Firefly. Powered primarily by diffusion models and transformer architectures.

The process of using a trained AI model to make predictions or generate outputs on new, unseen data. Unlike training (which adjusts weights), inference runs the model forward. Inference speed, cost, and efficiency are critical for production AI applications.

J 2 TERMS

Techniques used to bypass the safety guardrails and content restrictions of AI models to elicit prohibited outputs. Jailbreak methods include role-playing prompts, encoding tricks, and multi-turn manipulation. AI developers continuously patch vulnerabilities as new jailbreaks are discovered.

K 2 TERMS

A model compression technique where a smaller "student" model is trained to replicate the behavior of a larger "teacher" model. Distillation transfers knowledge by matching soft probability distributions, producing efficient models suitable for edge deployment without significant accuracy loss.

A structured representation of real-world entities and the relationships between them, stored as a network of nodes and edges. Knowledge graphs power search engines (Google), recommendation systems, and AI assistants by providing organized, queryable factual knowledge.

L 5 TERMS

A mathematical function that measures how far a model's predictions deviate from the actual target values. The training process aims to minimize this loss. Common loss functions include cross-entropy (classification), mean squared error (regression), and contrastive loss.

M 5 TERMS

Techniques for reducing AI model size and computational requirements while preserving accuracy. Methods include pruning (removing unimportant weights), quantization (reducing numerical precision), and knowledge distillation. Essential for deploying models on mobile and edge devices.

A neural network architecture that routes inputs to specialized sub-networks (experts) rather than processing through the entire model. Only a subset of experts activates per input, enabling massive model capacity while keeping compute costs manageable. GPT-4 and Mixtral use MoE.

Training AI models to process and relate information from multiple modalities (text, images, audio, video) simultaneously. Multimodal models like GPT-4V, Gemini, and LLaVA can understand images alongside text, enabling richer interactions and more capable AI applications.

N 3 TERMS

AI designed and trained for a specific, well-defined task. Unlike AGI, narrow AI cannot generalize beyond its domain. Examples include chess engines, spam filters, recommendation algorithms, and image classifiers. All currently deployed AI systems are forms of narrow AI.

A computing system inspired by the biological neural networks of the human brain. Composed of interconnected layers of nodes (neurons) that process information, neural networks learn to recognize patterns through training. They form the foundation of modern deep learning and AI.

O 3 TERMS

AI models and tools released with open licenses allowing anyone to use, modify, and distribute them. Open source models like Llama 3, Mistral, Stable Diffusion, and Whisper democratize AI access, enable customization, and foster community innovation.

When a model learns the training data too well—including noise and outliers—resulting in excellent training performance but poor generalization to new data. Overfitting is combated with techniques like regularization, dropout, data augmentation, and early stopping.

P 7 TERMS

The internal variables (weights and biases) of a neural network that are learned during training. Model size is often described by parameter count—GPT-4 is estimated at 1.8 trillion parameters. More parameters generally enable greater model capability but require more compute.

A measurement of how well a language model predicts a sample of text. Lower perplexity indicates better prediction quality. While useful for comparing models on the same dataset, perplexity alone doesn't capture practical qualities like helpfulness, safety, or instruction-following ability.

The initial phase of training a foundation model on a large, diverse dataset to learn general knowledge and language patterns. Pre-training is computationally expensive (millions of dollars for frontier models) and produces a base model that is then fine-tuned for specific tasks.

Techniques and approaches that enable AI training and inference while protecting sensitive personal data. Methods include federated learning, differential privacy, homomorphic encryption, and secure multi-party computation, addressing GDPR and privacy regulations.

A model compression technique that removes less important weights or neurons from a trained neural network. Pruning reduces model size and inference time while maintaining most accuracy. Structured pruning removes entire filters/layers; unstructured pruning removes individual weights.

Q 1 TERM

A model optimization technique that reduces the numerical precision of model weights (e.g., from 32-bit to 8-bit or 4-bit integers). Quantization shrinks model size and speeds up inference with minimal accuracy loss, enabling large models to run on consumer GPUs and mobile devices.

R 10 TERMS

An AI system that predicts and suggests items a user may be interested in based on their behavior, preferences, and similarities to other users. Powers personalization on Netflix, Spotify, Amazon, YouTube, and social media platforms.

The practice of deliberately testing AI systems by attempting to elicit harmful, biased, or unsafe outputs. Red teams simulate adversarial attacks, jailbreaks, and edge cases to identify vulnerabilities before deployment, improving model safety and robustness.

A supervised learning task where the model predicts continuous numerical values (as opposed to categories). Examples include predicting house prices, stock returns, temperature, and customer lifetime value. Common algorithms include linear regression, random forests, and neural networks.

The integration of AI with physical robots to enable autonomous perception, decision-making, and manipulation in the real world. AI-powered robots perform tasks in manufacturing, warehousing, surgery, agriculture, and domestic assistance.

S 13 TERMS

Empirical relationships showing how model performance improves predictably with increases in model size, training data, and compute. Discovered by researchers at OpenAI and DeepMind, scaling laws guide decisions about how to allocate resources for training frontier AI models.

A mechanism within transformers that computes the relevance of each element in a sequence to every other element. Self-attention enables models to capture long-range dependencies and contextual relationships, forming the core computational block of GPT, BERT, and other transformer models.

A training paradigm where models learn from unlabeled data by creating their own supervisory signals from the data structure itself. Examples include next-word prediction (GPT), masked word prediction (BERT), and contrastive learning (CLIP). Most modern foundation models use self-supervised learning.

An NLP technique that determines the emotional tone (positive, negative, neutral) of text. Used in social media monitoring, customer feedback analysis, brand reputation management, and market research. Modern sentiment analysis leverages transformer-based models for nuanced understanding.

A model architecture that transforms one sequence into another, using an encoder to process the input and a decoder to generate the output. Originally developed for machine translation, seq2seq powers text summarization, chatbots, and code generation.

An open-source text-to-image diffusion model developed by Stability AI. It generates high-quality images from text prompts and can be run locally, fine-tuned, and extended. Stable Diffusion popularized AI art creation and spawned a vast ecosystem of models, tools, and communities.

A theoretical level of AI that vastly surpasses human intelligence in virtually all cognitive domains, including scientific creativity, general wisdom, and social skills. Superintelligence is a central topic in AI safety research and long-term risk assessment.

A machine learning approach where models are trained on labeled datasets—input-output pairs where the correct answer is known. The model learns to map inputs to outputs and generalizes to new data. Common tasks include classification, regression, and object detection.

Artificially generated data that mimics real-world data patterns without containing actual personal or sensitive information. Synthetic data is used to augment training datasets, protect privacy, address data scarcity, and test AI systems in scenarios where real data is unavailable or restricted.

T 13 TERMS

A parameter that controls the randomness of AI model outputs during text generation. Lower temperature (e.g., 0.1) produces more deterministic, focused responses; higher temperature (e.g., 1.0+) increases creativity and diversity but may reduce coherence. Temperature affects sampling probability distributions.

The generation of images from natural language descriptions using AI models. Text-to-image systems like DALL-E 3, Midjourney, Stable Diffusion, and Adobe Firefly interpret prompts to create photorealistic images, artwork, designs, and illustrations from textual descriptions.

AI technology that converts written text into natural-sounding spoken audio. Modern TTS models like ElevenLabs, PlayHT, and XTTS achieve human-like voice quality with emotional expression, voice cloning, and multilingual support. Used in audiobooks, accessibility, virtual assistants, and content creation.

AI technology that generates video content from text descriptions. Models like Sora, Runway Gen-3, Pika, and Kling create cinematic videos with complex scenes, camera movements, and character animations from natural language prompts, revolutionizing video production.

The basic unit of text that language models process. A token can be a word, subword, or character depending on the tokenizer. "ChatGPT" might be one token, while "unbelievable" could be split into "un", "believ", "able". Token counts determine model input limits, pricing, and context window size.

An algorithm that breaks text into tokens for model processing. Different models use different tokenization strategies: BPE (Byte Pair Encoding), WordPiece, or SentencePiece. The tokenizer determines how text maps to numerical IDs the model can understand.

The capability of AI models to interact with external tools and services—web browsers, calculators, code interpreters, databases, and APIs. Tool use transforms LLMs from passive text generators into active agents that can retrieve information, perform calculations, and execute actions.

Google's custom-designed AI accelerator chip optimized for machine learning workloads, particularly matrix operations used in neural networks. TPUs are used to train Google's largest models (Gemini, PaLM) and are available through Google Cloud for external developers.

The process of teaching an AI model to perform tasks by exposing it to data and adjusting its internal parameters to minimize prediction errors. Training involves forward passes (predictions), loss computation, and backward passes (weight updates via backpropagation).

The dataset used to teach machine learning models. Training data quality, quantity, diversity, and labeling accuracy directly impact model performance. For LLMs, training data includes web text, books, code, and academic papers—often billions of tokens.

A technique where knowledge gained from training a model on one task is applied to a different but related task. Transfer learning enables fine-tuning pre-trained foundation models for specific applications with much less data and compute than training from scratch.

The dominant neural network architecture powering modern AI, introduced in the 2017 "Attention Is All You Need" paper. Transformers use self-attention mechanisms to process entire sequences in parallel, enabling efficient training on massive datasets. GPT, BERT, Gemini, Claude, and Llama are all transformer-based.

U 1 TERM

A machine learning approach where models find patterns and structures in unlabeled data without predefined outputs. Techniques include clustering (K-means, DBSCAN), dimensionality reduction (PCA, t-SNE), and anomaly detection. Used for customer segmentation, fraud detection, and data exploration.

V 4 TERMS

A generative model that learns to encode data into a continuous latent space and decode it back, enabling generation of new data samples. VAEs are used in image generation, drug discovery, anomaly detection, and as components within larger generative systems like Stable Diffusion.

AI technology that replicates a person's voice from audio samples, enabling the synthesis of new speech in that voice. Used in content creation, dubbing, accessibility, and entertainment. Tools like ElevenLabs and Resemble.AI can clone voices from just a few seconds of audio.

W 2 TERMS

Techniques for embedding invisible markers in AI-generated content (text, images, audio) to indicate it was created by AI. Watermarking helps combat misinformation, protects intellectual property, and enables content authentication. Companies like Google and OpenAI implement watermarking in their models.

The numerical values within a neural network that are adjusted during training to minimize prediction errors. Weights determine how input signals are transformed as they pass through network layers. The collection of all weights constitutes the model's learned knowledge and capabilities.

Z 1 TERM

Explore AI tools in action

See these concepts applied — browse our full directory of AI tools across every category with real pricing, reviews, and comparisons.