Pdf Processing
Extract, merge, and manipulate PDF files with Python automation
✨ The solution you've been looking for
Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
See It In Action
Interactive preview & real-world examples
AI Conversation Simulator
See how users interact with this skill
User Prompt
I need to extract all tables from this quarterly report PDF and convert them to CSV format
Skill Processing
Analyzing request...
Agent Response
Python code that uses pdfplumber to extract tables and save them as structured CSV files
Quick Start (3 Steps)
Get up and running in minutes
Install
claude-code skill install pdf-processing
claude-code skill install pdf-processingConfig
First Trigger
@pdf-processing helpCommands
| Command | Description | Required Args |
|---|---|---|
| @pdf-processing extract-data-from-reports | Extract text and tables from PDF reports for analysis | None |
| @pdf-processing batch-document-processing | Process multiple PDF files to extract text content | None |
| @pdf-processing pdf-document-management | Merge, split, or reorganize PDF documents | None |
Typical Use Cases
Extract Data from Reports
Extract text and tables from PDF reports for analysis
Batch Document Processing
Process multiple PDF files to extract text content
PDF Document Management
Merge, split, or reorganize PDF documents
Overview
PDF Processing
Quick start
Use pdfplumber to extract text from PDFs:
1import pdfplumber
2
3with pdfplumber.open("document.pdf") as pdf:
4 text = pdf.pages[0].extract_text()
5 print(text)
Extracting tables
Extract tables from PDFs with automatic detection:
1import pdfplumber
2
3with pdfplumber.open("report.pdf") as pdf:
4 page = pdf.pages[0]
5 tables = page.extract_tables()
6
7 for table in tables:
8 for row in table:
9 print(row)
Extracting all pages
Process multi-page documents efficiently:
1import pdfplumber
2
3with pdfplumber.open("document.pdf") as pdf:
4 full_text = ""
5 for page in pdf.pages:
6 full_text += page.extract_text() + "\n\n"
7
8 print(full_text)
Form filling
For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.
Merging PDFs
Combine multiple PDF files:
1from pypdf import PdfMerger
2
3merger = PdfMerger()
4
5for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
6 merger.append(pdf)
7
8merger.write("merged.pdf")
9merger.close()
Splitting PDFs
Extract specific pages or ranges:
1from pypdf import PdfReader, PdfWriter
2
3reader = PdfReader("input.pdf")
4writer = PdfWriter()
5
6# Extract pages 2-5
7for page_num in range(1, 5):
8 writer.add_page(reader.pages[page_num])
9
10with open("output.pdf", "wb") as output:
11 writer.write(output)
Available packages
- pdfplumber - Text and table extraction (recommended)
- pypdf - PDF manipulation, merging, splitting
- pdf2image - Convert PDFs to images (requires poppler)
- pytesseract - OCR for scanned PDFs (requires tesseract)
Common patterns
Extract and save text:
1import pdfplumber
2
3with pdfplumber.open("input.pdf") as pdf:
4 text = "\n\n".join(page.extract_text() for page in pdf.pages)
5
6with open("output.txt", "w") as f:
7 f.write(text)
Extract tables to CSV:
1import pdfplumber
2import csv
3
4with pdfplumber.open("tables.pdf") as pdf:
5 tables = pdf.pages[0].extract_tables()
6
7 with open("output.csv", "w", newline="") as f:
8 writer = csv.writer(f)
9 for table in tables:
10 writer.writerows(table)
Error handling
Handle common PDF issues:
1import pdfplumber
2
3try:
4 with pdfplumber.open("document.pdf") as pdf:
5 if len(pdf.pages) == 0:
6 print("PDF has no pages")
7 else:
8 text = pdf.pages[0].extract_text()
9 if text is None or text.strip() == "":
10 print("Page contains no extractable text (might be scanned)")
11 else:
12 print(text)
13except Exception as e:
14 print(f"Error processing PDF: {e}")
Performance tips
- Process pages in batches for large PDFs
- Use multiprocessing for multiple files
- Extract only needed pages rather than entire document
- Close PDF objects after use
What Users Are Saying
Real feedback from the community
Environment Matrix
Dependencies
Framework Support
Context Window
Security & Privacy
Information
- Author
- davila7
- Updated
- 2026-01-30
- Category
- productivity-tools
Related Skills
Pdf Processing
Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files …
View Details →Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, …
View Details →Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, …
View Details →