Computer Vision
Simple Definition
Computer vision is the field of AI that enables computers to “see” and interpret visual information — recognizing what’s in images and video, understanding spatial relationships, and making sense of the visual world.
It’s what allows your phone to recognize your face, a Tesla to identify pedestrians, and Google Photos to let you search your pictures by content.
What Computer Vision Can Do
Object detection — identify and locate objects in images (“there’s a cat in the top-left corner”)
Image classification — categorize an entire image (“this is a photo of a beach”)
Facial recognition — identify specific individuals from photos or video
Scene understanding — understand the full context of an image
Optical character recognition (OCR) — extract text from images and documents
Medical imaging — detect tumors, diagnoses, anomalies in scans
Video analysis — track objects, detect events, analyze motion over time
How It Works
Modern computer vision uses deep learning — particularly convolutional neural networks (CNNs) and increasingly vision transformers (ViTs). These models are trained on millions of labeled images and learn to extract visual features at progressively higher levels of abstraction.
Applications in Everyday Life
- Smartphone face unlock
- Google Lens — identify objects by pointing your camera
- Self-driving car perception
- Industrial quality control
- Security and surveillance cameras
- Augmented reality filters
Computer Vision in Multimodal AI
Modern AI assistants like GPT-4o and Claude can “see” — you can send them images and they’ll describe, analyze, or answer questions about what they see. This is computer vision integrated into conversational AI.
Related Terms
- Deep Learning — the technology powering computer vision
- Neural Network — the architecture used in vision models
- Multimodal AI — AI that combines vision with language and other modalities
- Artificial Intelligence — the broader field computer vision belongs to
See AI terms in action
Browse practical AI workflows that use the concepts in this glossary.
Last updated: