Trixly AI Solutions
Agentic Software Engineering

How to Convert PDF Ebooks to Audiobooks Using Python and Kokoro TTS in Google Colab

By Muhammad Hassan
January 28, 20265 min read

Ever wanted to listen to your favorite ebooks while commuting, exercising, or doing chores? Converting PDFs to audiobooks is easier than you think. In this comprehensive guide, I'll walk you through creating a Python script that transforms any PDF ebook into a high-quality audiobook with multiple voices using Google Colab and the Kokoro text-to-speech engine.

What makes this solution special is that it's completely free, runs in the cloud, and produces natural-sounding audio with multiple voice options. You can even assign different voices to different parts of your book for a more engaging listening experience.

What You'll Need

✓ A Google account for accessing Google Colab and Google Drive

✓ A PDF ebook stored in your Google Drive

✓ Basic Python knowledge (helpful but not required)

✓ About 15-30 minutes depending on your ebook's length

Why Use Kokoro TTS?

While there are many text-to-speech engines out there, Kokoro stands out for several reasons. It produces remarkably natural-sounding voices, supports multiple speaker personalities, and works seamlessly in Google Colab's environment. Plus, it's completely free to use for personal projects.

The voices sound human-like enough that you won't get tired of listening, even during long chapters. And by rotating between different voices for different sections, you can create a more dynamic listening experience.

Step 1: Setting Up Your Environment

First, we need to connect Google Colab to your Google Drive where your PDF is stored. This is straightforward and only requires a single authorization step.

from google.colab import drive
drive.mount('/content/drive')

When you run this code, Google will ask you to authorize access to your Drive. Click the link, sign in, and paste the authorization code back into the notebook. That's it – you now have full access to your files.

Step 2: Locating Your PDF Files

Now let's write a simple script to find all PDF files in a specific folder. This is helpful if you have multiple ebooks and want to process them one at a time.

import os

# Define the path to your ebook folder
folder_path = '/content/drive/My Drive/ebook-hidden-love'

# Check if the folder exists
if not os.path.exists(folder_path):
    print(f"Error: Folder not found at {folder_path}")
else:
    # List all files in the folder
    all_files = os.listdir(folder_path)
    
    # Filter for PDF files
    pdf_files = [f for f in all_files if f.lower().endswith('.pdf')]
    
    if pdf_files:
        print("Found PDF files:")
        for pdf_file in pdf_files:
            print(pdf_file)
    else:
        print("No PDF files found in the folder.")
💡 Pro Tip: Make sure to update the folder_path variable to match where your PDFs are actually stored in Google Drive. The path should always start with /content/drive/My Drive/ followed by your folder structure.

Step 3: Extracting Text from the PDF

This is where the magic begins. We'll use PyPDF2 to extract all the text from your PDF file. The process is surprisingly simple, but we need to handle it carefully to ensure we don't lose important content.

# Install PyPDF2 if not already installed
!pip install PyPDF2

from PyPDF2 import PdfReader
import os

# Select the PDF you want to convert
selected_pdf_filename = pdf_files[0]  # Change index as needed
full_pdf_path = os.path.join(folder_path, selected_pdf_filename)

# Open and read the PDF
reader = PdfReader(full_pdf_path)

# Extract text from all pages
raw_text = ""
for page in reader.pages:
    raw_text += page.extract_text() + "\n"

print(f"Extracted {len(raw_text)} characters from {selected_pdf_filename}")

At this point, you have all the text from your PDF, but it's probably a bit messy. PDF extraction often includes headers, footers, page numbers, and weird spacing. That's what we'll clean up next.

Step 4: Cleaning and Processing the Text

Raw PDF text can be chaotic. We need to remove unwanted elements and format it properly for the text-to-speech engine. Here's how we do that:

import re

# Replace multiple newlines with single spaces
cleaned_text = re.sub(r'\n+', ' ', raw_text)

# Remove multiple spaces
cleaned_text = re.sub(r'\s+', ' ', cleaned_text).strip()

# Remove special characters that don't belong in natural text
cleaned_text = cleaned_text.replace('■', '')

# Split into sentences for better processing
sentences = re.split(r'(?<=[.!?])\s+', cleaned_text)

print(f"Cleaned text contains {len(sentences)} sentences")
print("First few sentences:")
for i, sent in enumerate(sentences[:3]):
    print(f"{i+1}. {sent}")
⚠️ Important: Depending on your PDF's format, you might need to adjust the cleaning patterns. Some ebooks have specific headers or footers that appear on every page. Take a look at your extracted text and add custom regex patterns to remove these recurring elements.

Step 5: Chunking Text for TTS Processing

Text-to-speech engines work best with smaller chunks of text rather than entire chapters at once. We'll group our sentences into manageable pieces, typically around 10 sentences per chunk. This also makes it easier to assign different voices to different sections.

text_chunks = []
chunk_size = 10

for i in range(0, len(sentences), chunk_size):
    chunk = ' '.join(sentences[i:i + chunk_size])
    text_chunks.append(chunk)

print(f"Created {len(text_chunks)} text chunks")

Step 6: Installing and Setting Up Kokoro TTS

Now comes the exciting part – setting up the text-to-speech engine. Kokoro requires a system library called espeak-ng, which we'll install first, followed by the Kokoro library itself.

# Install system dependencies
!apt-get update && apt-get install -y espeak-ng libespeak-ng-dev

# Install Kokoro TTS
!pip install kokoro

print("Kokoro TTS successfully installed!")

Step 7: Initializing Multiple Voices

One of the coolest features of this approach is using multiple voices. This makes long audiobooks much more engaging. Here's how to set up the voice rotation system:

from kokoro import KPipeline

# Initialize the TTS pipeline
k_pipeline = KPipeline(lang_code='a')

# Define multiple voices to cycle through
voices = ['af_heart', 'af_bella', 'am_adam', 'am_michael']

print(f"Initialized Kokoro with {len(voices)} voices")
print(f"Voices: {', '.join(voices)}")

The voices include both female (af_heart, af_bella) and male (am_adam, am_michael) options. The script will automatically rotate through these voices for each chunk of text, creating a multi-narrator effect.

Step 8: Converting Text to Audio

This is where everything comes together. We'll process each text chunk, assign it a voice, and generate audio. The script handles errors gracefully, so if one chunk fails, the rest will continue processing.

import numpy as np
import soundfile as sf

kokoro_audio_segments = []
num_voices = len(voices)

print("Starting text-to-speech conversion...")

for i, chunk in enumerate(text_chunks):
    # Rotate through available voices
    current_voice = voices[i % num_voices]
    
    try:
        # Generate audio for this chunk
        generator = k_pipeline(chunk, voice=current_voice)
        
        # Collect all audio arrays
        audio_arrays = []
        for gs, ps, audio in generator:
            audio_arrays.append(audio)
        
        # Combine into single array
        if audio_arrays:
            combined_audio = np.concatenate(audio_arrays)
        else:
            combined_audio = np.array([])
        
        kokoro_audio_segments.append(combined_audio)
        
        print(f"✓ Processed chunk {i+1}/{len(text_chunks)} with {current_voice}")
        
    except Exception as e:
        print(f"✗ Error on chunk {i+1}: {e}")
        kokoro_audio_segments.append(np.array([]))

print(f"\nGenerated {len(kokoro_audio_segments)} audio segments!")
💡 Performance Note: Processing time depends on your text length. A typical chapter might take 5-10 minutes to convert. You can monitor progress through the printed status messages.

Step 9: Combining Audio Segments

Now we have multiple audio chunks, but we want one continuous audiobook file. Let's merge all the segments together:

# Filter out any empty segments from errors
valid_segments = [seg for seg in kokoro_audio_segments if seg.size > 0]

# Combine all segments into one audio file
combined_audio = np.concatenate(valid_segments)
combined_audio = combined_audio.astype(np.float32)

print(f"Combined audio shape: {combined_audio.shape}")
print(f"Total duration: ~{len(combined_audio) / 24000 / 60:.1f} minutes")

Step 10: Saving Your Audiobook

The final step is saving your newly created audiobook back to Google Drive so you can download it and listen anywhere:

import os

# Create output folder if it doesn't exist
output_folder = "/content/drive/MyDrive/audio_chapters"
os.makedirs(output_folder, exist_ok=True)

# Save the audiobook
output_path = os.path.join(output_folder, "chapter_1.wav")
sampling_rate = 24000  # Kokoro's default sample rate

sf.write(output_path, combined_audio, sampling_rate)
print(f"✓ Audiobook saved to: {output_path}")
print(f"File size: {os.path.getsize(output_path) / (1024*1024):.1f} MB")

Your audiobook is now ready! You can find it in your Google Drive in the audio_chapters folder. Download it to your phone or computer and start listening.

Tips for Better Results

Choose Clean PDFs

The quality of your audiobook depends heavily on the source PDF. Text-based PDFs work much better than scanned images. If you have a scanned PDF, consider using OCR software first.

Experiment with Voice Assignments

Try different voice combinations to find what sounds best for your content. You might want all female voices for a romance novel or all male voices for a technical manual.

Process Chapters Individually

Rather than converting an entire book at once, process it chapter by chapter. This makes the files more manageable and easier to navigate when listening.

Adjust Chunk Size

The default chunk size of 10 sentences works well, but you can adjust it. Larger chunks mean fewer voice switches, while smaller chunks create more variety.

Troubleshooting Common Issues

Problem: "Folder not found" Error

Double-check your folder path. Remember that Google Drive paths in Colab start with /content/drive/My Drive/ and are case-sensitive.

Problem: Poor Audio Quality

This usually means the extracted text was messy. Spend more time on the cleaning step, removing headers, footers, and formatting artifacts.

Problem: Very Long Processing Time

Large books take time. If it's taking too long, consider breaking your PDF into smaller sections before processing. You can also reduce the chunk size to see progress faster.

Problem: Some Chunks Failing

The error handling in the script ensures that one failed chunk doesn't stop the entire process. Check the printed error messages to see what went wrong with specific chunks.

What's Next?

Once you've mastered the basics, you can extend this script in many ways. Here are some ideas:

Batch Processing: Modify the script to automatically process all PDFs in a folder, creating a complete audiobook library.

Custom Voice Mapping: Assign specific voices to specific characters if you're converting dialogue-heavy content like novels or plays.

Background Music: Add subtle background music or sound effects between chapters to make your audiobook more professional.

Speed Control: Adjust the playback speed in post-processing to create faster or slower versions for different listening situations.

Wrapping Up

Creating your own audiobooks might seem daunting at first, but as you've seen, Python makes it surprisingly accessible. With just a few libraries and some free cloud computing power from Google Colab, you can transform any PDF into a listenable audiobook.

The best part? You have complete control over the process. You choose the voices, control the pacing, and can customize every aspect of the output. Whether you're converting textbooks for studying, novels for entertainment, or research papers for review, this approach works beautifully.

I've been using this method for months now, and it's completely changed how I consume written content. Long commutes have become opportunities to catch up on reading, and I can multitask while "reading" books I'd never have time to sit down with.

Give it a try with your favorite ebook, and you might just discover a whole new way to enjoy literature. Happy listening!

Have questions or run into issues? The Kokoro and PyPDF2 communities are incredibly helpful or you can reach out to us as well. Don't hesitate to reach out for support, and always remember to respect copyright when converting books to audiobooks – only convert content you own or have the right to use.

Need Help with Custom AI Solutions?

Whether you need automation scripts, AI integrations, or custom development projects, our team at TrixlyAI is here to help bring your ideas to life.

Get in Touch with Trixly AI Solutions →
M

Written by Muhammad Hassan

Expert insights and analysis on Enterprise AI solutions. Helping businesses leverage the power of autonomous agents.