Introduction to Gen AI for Curious Developers
Building Multi-Modal Applications with Gemini (2nd Edition)
A hands-on, beginner-friendly guide to building real multi-modal AI applications in Python.
What this book is about.
You have used ChatGPT to draft an email. You have asked Claude to explain a tricky bit of code. You have watched a Gemini demo describe a video frame by frame. You have probably read a thread about 'agents' that book travel or debug your repo while you sleep. Maybe a coworker shipped a feature last week that talks to a model, and you realised you could have built it too if you just knew where to start. Now you want to stop watching and start building. This book is for that exact moment.
The good news is that 2026 is the easiest year in history to do this. The tools have settled. The APIs are clean. The free tiers are generous. You can have a Python program talking to a frontier model in under twenty lines of code, and most of those lines are imports.
Who it is for
Developers who know some Python and want to build with generative AI without slogging through a machine-learning degree first. If you can write a script, read an API reference, and install a package, you have everything you need.
The book assumes zero prior AI experience. It does not assume a GPU, a beefy laptop, or a credit card — the Gemini free tier covers every example in the book.
What it is not
This book will not teach machine-learning theory. You will not train or fine-tune a model, because in 2026 roughly nine out of ten real applications do not need it.
This book is not a survey of every API on the market. Breadth is the enemy of actually running the code. You will finish the book with one SDK in muscle memory and the confidence that the patterns transfer anywhere.
This book is not an agent architecture reference for enterprise work. For that, see Book One of The AI Black Book series, Enterprise AI Agents.
How the book is structured
Ten chapters, arranged to build on each other. Chapter 1 orients you. Chapter 2 gets your environment set up and your first generation working. Chapter 3 introduces structured output and schemas so you can wire model responses into real code. Chapter 4 covers tool use. Chapter 5 is images in and out. Chapter 6 is audio. Chapter 7 is video. Chapter 8 builds your first agent, a model that plans, acts, and iterates. Chapter 9 is the capstone: a single multi-modal application that stitches the previous chapters into one product. Chapter 10 is a short pointer chapter for where to go next.
Primary readers
Other books
Enterprise AI Agents
Design Principles and Practical Guidance
The AI Transformation Playbook
From Pilots to Production
AI Security for the Enterprise
A Threat-Model-First Playbook
AI-Native Business Models
Reinventing Companies in the Age of AI
Data Strategy in the Age of AI
Building the Retrieval-Ready Enterprise