Ever wanted to listen to your favorite ebooks while commuting, exercising, or doing chores? Converting PDFs to audiobooks is easier than you think. In this comprehensive guide, I'll walk you through creating a Python script that transforms any PDF ebook into a high-quality audiobook with multiple voices using Google Colab and the Kokoro text-to-speech engine.
What makes this solution special is that it's completely free, runs in the cloud, and produces natural-sounding audio with multiple voice options. You can even assign different voices to different parts of your book for a more engaging listening experience.
What You'll Need
✓ A Google account for accessing Google Colab and Google Drive
✓ A PDF ebook stored in your Google Drive
✓ Basic Python knowledge (helpful but not required)
✓ About 15-30 minutes depending on your ebook's length
Why Use Kokoro TTS?
While there are many text-to-speech engines out there, Kokoro stands out for several reasons. It produces remarkably natural-sounding voices, supports multiple speaker personalities, and works seamlessly in Google Colab's environment. Plus, it's completely free to use for personal projects.
The voices sound human-like enough that you won't get tired of listening, even during long chapters. And by rotating between different voices for different sections, you can create a more dynamic listening experience.
Step 1: Setting Up Your Environment
First, we need to connect Google Colab to your Google Drive where your PDF is stored. This is straightforward and only requires a single authorization step.
from google.colab import drive
drive.mount('/content/drive')
When you run this code, Google will ask you to authorize access to your Drive. Click the link, sign in, and paste the authorization code back into the notebook. That's it – you now have full access to your files.
Step 2: Locating Your PDF Files
Now let's write a simple script to find all PDF files in a specific folder. This is helpful if you have multiple ebooks and want to process them one at a time.
import os
# Define the path to your ebook folder
folder_path = '/content/drive/My Drive/ebook-hidden-love'
# Check if the folder exists
if not os.path.exists(folder_path):
print(f"Error: Folder not found at {folder_path}")
else:
# List all files in the folder
all_files = os.listdir(folder_path)
# Filter for PDF files
pdf_files = [f for f in all_files if f.lower().endswith('.pdf')]
if pdf_files:
print("Found PDF files:")
for pdf_file in pdf_files:
print(pdf_file)
else:
print("No PDF files found in the folder.")
folder_path variable to match where your PDFs are actually stored in Google Drive. The path should always start with /content/drive/My Drive/ followed by your folder structure.
Step 3: Extracting Text from the PDF
This is where the magic begins. We'll use PyPDF2 to extract all the text from your PDF file. The process is surprisingly simple, but we need to handle it carefully to ensure we don't lose important content.
# Install PyPDF2 if not already installed
!pip install PyPDF2
from PyPDF2 import PdfReader
import os
# Select the PDF you want to convert
selected_pdf_filename = pdf_files[0] # Change index as needed
full_pdf_path = os.path.join(folder_path, selected_pdf_filename)
# Open and read the PDF
reader = PdfReader(full_pdf_path)
# Extract text from all pages
raw_text = ""
for page in reader.pages:
raw_text += page.extract_text() + "\n"
print(f"Extracted {len(raw_text)} characters from {selected_pdf_filename}")
At this point, you have all the text from your PDF, but it's probably a bit messy. PDF extraction often includes headers, footers, page numbers, and weird spacing. That's what we'll clean up next.
Step 4: Cleaning and Processing the Text
Raw PDF text can be chaotic. We need to remove unwanted elements and format it properly for the text-to-speech engine. Here's how we do that:
import re
# Replace multiple newlines with single spaces
cleaned_text = re.sub(r'\n+', ' ', raw_text)
# Remove multiple spaces
cleaned_text = re.sub(r'\s+', ' ', cleaned_text).strip()
# Remove special characters that don't belong in natural text
cleaned_text = cleaned_text.replace('■', '')
# Split into sentences for better processing
sentences = re.split(r'(?<=[.!?])\s+', cleaned_text)
print(f"Cleaned text contains {len(sentences)} sentences")
print("First few sentences:")
for i, sent in enumerate(sentences[:3]):
print(f"{i+1}. {sent}")
Step 5: Chunking Text for TTS Processing
Text-to-speech engines work best with smaller chunks of text rather than entire chapters at once. We'll group our sentences into manageable pieces, typically around 10 sentences per chunk. This also makes it easier to assign different voices to different sections.
text_chunks = []
chunk_size = 10
for i in range(0, len(sentences), chunk_size):
chunk = ' '.join(sentences[i:i + chunk_size])
text_chunks.append(chunk)
print(f"Created {len(text_chunks)} text chunks")
Step 6: Installing and Setting Up Kokoro TTS
Now comes the exciting part – setting up the text-to-speech engine. Kokoro requires a system library called espeak-ng, which we'll install first, followed by the Kokoro library itself.
# Install system dependencies
!apt-get update && apt-get install -y espeak-ng libespeak-ng-dev
# Install Kokoro TTS
!pip install kokoro
print("Kokoro TTS successfully installed!")
Step 7: Initializing Multiple Voices
One of the coolest features of this approach is using multiple voices. This makes long audiobooks much more engaging. Here's how to set up the voice rotation system:
from kokoro import KPipeline
# Initialize the TTS pipeline
k_pipeline = KPipeline(lang_code='a')
# Define multiple voices to cycle through
voices = ['af_heart', 'af_bella', 'am_adam', 'am_michael']
print(f"Initialized Kokoro with {len(voices)} voices")
print(f"Voices: {', '.join(voices)}")
The voices include both female (af_heart, af_bella) and male (am_adam, am_michael) options. The script will automatically rotate through these voices for each chunk of text, creating a multi-narrator effect.
Step 8: Converting Text to Audio
This is where everything comes together. We'll process each text chunk, assign it a voice, and generate audio. The script handles errors gracefully, so if one chunk fails, the rest will continue processing.
import numpy as np
import soundfile as sf
kokoro_audio_segments = []
num_voices = len(voices)
print("Starting text-to-speech conversion...")
for i, chunk in enumerate(text_chunks):
# Rotate through available voices
current_voice = voices[i % num_voices]
try:
# Generate audio for this chunk
generator = k_pipeline(chunk, voice=current_voice)
# Collect all audio arrays
audio_arrays = []
for gs, ps, audio in generator:
audio_arrays.append(audio)
# Combine into single array
if audio_arrays:
combined_audio = np.concatenate(audio_arrays)
else:
combined_audio = np.array([])
kokoro_audio_segments.append(combined_audio)
print(f"✓ Processed chunk {i+1}/{len(text_chunks)} with {current_voice}")
except Exception as e:
print(f"✗ Error on chunk {i+1}: {e}")
kokoro_audio_segments.append(np.array([]))
print(f"\nGenerated {len(kokoro_audio_segments)} audio segments!")
Step 9: Combining Audio Segments
Now we have multiple audio chunks, but we want one continuous audiobook file. Let's merge all the segments together:
# Filter out any empty segments from errors
valid_segments = [seg for seg in kokoro_audio_segments if seg.size > 0]
# Combine all segments into one audio file
combined_audio = np.concatenate(valid_segments)
combined_audio = combined_audio.astype(np.float32)
print(f"Combined audio shape: {combined_audio.shape}")
print(f"Total duration: ~{len(combined_audio) / 24000 / 60:.1f} minutes")
Step 10: Saving Your Audiobook
The final step is saving your newly created audiobook back to Google Drive so you can download it and listen anywhere:
import os
# Create output folder if it doesn't exist
output_folder = "/content/drive/MyDrive/audio_chapters"
os.makedirs(output_folder, exist_ok=True)
# Save the audiobook
output_path = os.path.join(output_folder, "chapter_1.wav")
sampling_rate = 24000 # Kokoro's default sample rate
sf.write(output_path, combined_audio, sampling_rate)
print(f"✓ Audiobook saved to: {output_path}")
print(f"File size: {os.path.getsize(output_path) / (1024*1024):.1f} MB")
Your audiobook is now ready! You can find it in your Google Drive in the audio_chapters folder. Download it to your phone or computer and start listening.
Tips for Better Results
Choose Clean PDFs
The quality of your audiobook depends heavily on the source PDF. Text-based PDFs work much better than scanned images. If you have a scanned PDF, consider using OCR software first.
Experiment with Voice Assignments
Try different voice combinations to find what sounds best for your content. You might want all female voices for a romance novel or all male voices for a technical manual.
Process Chapters Individually
Rather than converting an entire book at once, process it chapter by chapter. This makes the files more manageable and easier to navigate when listening.
Adjust Chunk Size
The default chunk size of 10 sentences works well, but you can adjust it. Larger chunks mean fewer voice switches, while smaller chunks create more variety.
Troubleshooting Common Issues
Problem: "Folder not found" Error
Double-check your folder path. Remember that Google Drive paths in Colab start with /content/drive/My Drive/ and are case-sensitive.
Problem: Poor Audio Quality
This usually means the extracted text was messy. Spend more time on the cleaning step, removing headers, footers, and formatting artifacts.
Problem: Very Long Processing Time
Large books take time. If it's taking too long, consider breaking your PDF into smaller sections before processing. You can also reduce the chunk size to see progress faster.
Problem: Some Chunks Failing
The error handling in the script ensures that one failed chunk doesn't stop the entire process. Check the printed error messages to see what went wrong with specific chunks.
What's Next?
Once you've mastered the basics, you can extend this script in many ways. Here are some ideas:
Batch Processing: Modify the script to automatically process all PDFs in a folder, creating a complete audiobook library.
Custom Voice Mapping: Assign specific voices to specific characters if you're converting dialogue-heavy content like novels or plays.
Background Music: Add subtle background music or sound effects between chapters to make your audiobook more professional.
Speed Control: Adjust the playback speed in post-processing to create faster or slower versions for different listening situations.
Wrapping Up
Creating your own audiobooks might seem daunting at first, but as you've seen, Python makes it surprisingly accessible. With just a few libraries and some free cloud computing power from Google Colab, you can transform any PDF into a listenable audiobook.
The best part? You have complete control over the process. You choose the voices, control the pacing, and can customize every aspect of the output. Whether you're converting textbooks for studying, novels for entertainment, or research papers for review, this approach works beautifully.
I've been using this method for months now, and it's completely changed how I consume written content. Long commutes have become opportunities to catch up on reading, and I can multitask while "reading" books I'd never have time to sit down with.
Give it a try with your favorite ebook, and you might just discover a whole new way to enjoy literature. Happy listening!
Have questions or run into issues? The Kokoro and PyPDF2 communities are incredibly helpful or you can reach out to us as well. Don't hesitate to reach out for support, and always remember to respect copyright when converting books to audiobooks – only convert content you own or have the right to use.
Need Help with Custom AI Solutions?
Whether you need automation scripts, AI integrations, or custom development projects, our team at TrixlyAI is here to help bring your ideas to life.
Get in Touch with Trixly AI Solutions →