What Is Pydantic and Why Should You Use It?

If you’ve been building Python applications, especially anything involving APIs or AI, you’ve probably run into a frustrating problem: bad data sneaking into your code.

Maybe a function expected an integer but got a string. Maybe an API response came back with missing fields. Maybe you spent 30 minutes debugging only to realize a dictionary key was misspelled. These kinds of bugs are silent, annoying, and surprisingly common in Python.

So how do you fix this? Meet Pydantic.


What Is Pydantic?

Pydantic is a Python library for data validation using type hints. You define what your data should look like using Python classes, and Pydantic makes sure the data actually matches, at runtime.

“Pydantic is the most widely used data validation library for Python. It uses Python type hints to validate data, serialize it, and generate JSON schemas, all with a clean, Pythonic API.”

Pydantic Docs

Think of It Like a Bouncer at a Club

Imagine your function is a club, and the data coming in is a guest. Without Pydantic, anyone walks in, no ID check, no guest list. With Pydantic, there’s a bouncer at the door: wrong type? Rejected. Missing field? Rejected. Everything checks out? Come on in.


A Simple Example

Let’s start with the basics. Here’s how you’d define a simple data model with Pydantic:

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    email: str

Now let’s use it:

# This works perfectly
user = User(name="Jorge", age=28, email="jorge@example.com")
print(user.name)  # Jorge

# Pydantic even converts types when possible
user2 = User(name="Dave", age="30", email="dave@example.com")
print(user2.age)  # 30 (converted from string to int!)

# This will raise a validation error
try:
    bad_user = User(name="Test", age="not a number", email="test@example.com")
except Exception as e:
    print(e)
    # age: Input should be a valid integer

That’s it. You define a class, add type hints, and Pydantic handles the rest. No more writing manual if isinstance(...) checks everywhere.


Why Pydantic Matters

I started paying attention to Pydantic when I noticed it’s used everywhere in the AI and data engineering ecosystem:

  • FastAPI uses Pydantic for request/response validation
  • LangChain uses Pydantic for structured outputs from LLMs
  • PydanticAI is a whole agent framework built on top of it
  • OpenAI’s API returns structured data that maps perfectly to Pydantic models

Benefits of using Pydantic:

  • Catches bugs early by validating data at runtime
  • Self-documenting code — your models describe the data shape
  • Automatic type coercion — converts compatible types for you
  • JSON serialization — convert to/from JSON with one method call
  • IDE support — full autocomplete and type checking

Going Deeper: Nested Models and Validators

Pydantic really shines when your data gets more complex. Let’s look at nested models:

from pydantic import BaseModel, EmailStr, field_validator
from typing import List, Optional

class Address(BaseModel):
    street: str
    city: str
    country: str = "Canada"  # Default value

class Employee(BaseModel):
    name: str
    age: int
    email: str
    address: Address  # Nested model!
    skills: List[str] = []  # List with default

    @field_validator("age")
    @classmethod
    def age_must_be_positive(cls, v):
        if v < 0:
            raise ValueError("Age must be positive")
        return v

Now we can create an employee with a nested address:

emp = Employee(
    name="Jorge",
    age=28,
    email="jorge@example.com",
    address={"street": "123 Main St", "city": "Calgary", "country": "Canada"},
    skills=["Python", "SQL", "Data Modeling"]
)

print(emp.address.city)  # Calgary
print(emp.model_dump())  # Convert to dictionary
print(emp.model_dump_json())  # Convert to JSON string

✅ Why this matters: In real-world applications, data is rarely flat. Pydantic handles nested structures, lists, optional fields, and custom validation, all with clean syntax.


Pydantic + AI: Structured LLM Outputs

Here’s where it gets really interesting for AI applications. LLMs return text, but often you need structured data. Pydantic solves this:

from pydantic import BaseModel
from openai import OpenAI

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str
    recommend: bool

client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract movie review details."},
        {"role": "user", "content": "I just watched Inception. Solid 9/10. Mind-bending plot with great visuals. Definitely watch it."}
    ],
    response_format=MovieReview,
)

review = completion.choices[0].message.parsed
print(review.title)      # Inception
print(review.rating)     # 9.0
print(review.recommend)  # True

✅ Why this matters: Instead of parsing messy text or hoping the LLM returns valid JSON, Pydantic guarantees the output matches your schema. This is the foundation of reliable AI applications.


Quick Reference: Key Pydantic Features

Feature What It Does
BaseModel Define data models with type hints
field_validator Add custom validation logic
model_dump() Convert model to dictionary
model_dump_json() Convert model to JSON string
model_validate() Create model from dictionary
Optional[type] Mark fields as optional
Default values Set fallback values for fields

Summary

Pydantic takes Python’s type hints and turns them into a runtime validation system. You define what your data should look like, and Pydantic enforces it. No more silent bugs from bad data, no more manual type checking, no more guessing what shape your data is in.

If you’re building anything with Python, whether it’s an API, a data pipeline, or an AI application, Pydantic should be in your toolkit.


Resources


Final Thoughts

When I was learning about RAG and LangChain, I kept seeing Pydantic show up in every tutorial. At first I thought it was just a nice-to-have, but once I started using it, I realized it’s essential. It’s the difference between code that might work and code that definitely works.

Start simple. Define a BaseModel. Add some type hints. Let Pydantic do the heavy lifting.

Cheers,

Jorge Rocha

Written on February 28, 2026