Pdf Processing

Extract, merge, and manipulate PDF files with Python automation

✨ The solution you've been looking for

Verified

Tested and verified by our team

16036 Stars

Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

pdf-processing text-extraction document-automation table-extraction form-filling file-merging data-extraction python

Repository

See It In Action

Interactive preview & real-world examples

Live Demo

AI Conversation Simulator

See how users interact with this skill

User Prompt

I need to extract all tables from this quarterly report PDF and convert them to CSV format

Skill Processing

Analyzing request...

Agent Response

Python code that uses pdfplumber to extract tables and save them as structured CSV files

Quick Start (3 Steps)

Get up and running in minutes

Install

claude-code skill install pdf-processing

claude-code skill install pdf-processing

Config

First Trigger

@pdf-processing help

Commands

Command	Description	Required Args
@pdf-processing extract-data-from-reports	Extract text and tables from PDF reports for analysis	None
@pdf-processing batch-document-processing	Process multiple PDF files to extract text content	None
@pdf-processing pdf-document-management	Merge, split, or reorganize PDF documents	None

Typical Use Cases

Extract Data from Reports

Extract text and tables from PDF reports for analysis

Batch Document Processing

Process multiple PDF files to extract text content

PDF Document Management

Merge, split, or reorganize PDF documents

Overview

PDF Processing

Quick start

Use pdfplumber to extract text from PDFs:

1import pdfplumber
2
3with pdfplumber.open("document.pdf") as pdf:
4    text = pdf.pages[0].extract_text()
5    print(text)

Extracting tables

Extract tables from PDFs with automatic detection:

1import pdfplumber
2
3with pdfplumber.open("report.pdf") as pdf:
4    page = pdf.pages[0]
5    tables = page.extract_tables()
6
7    for table in tables:
8        for row in table:
9            print(row)

Extracting all pages

Process multi-page documents efficiently:

1import pdfplumber
2
3with pdfplumber.open("document.pdf") as pdf:
4    full_text = ""
5    for page in pdf.pages:
6        full_text += page.extract_text() + "\n\n"
7
8    print(full_text)

Form filling

For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.

Merging PDFs

Combine multiple PDF files:

1from pypdf import PdfMerger
2
3merger = PdfMerger()
4
5for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
6    merger.append(pdf)
7
8merger.write("merged.pdf")
9merger.close()

Splitting PDFs

Extract specific pages or ranges:

 1from pypdf import PdfReader, PdfWriter
 2
 3reader = PdfReader("input.pdf")
 4writer = PdfWriter()
 5
 6# Extract pages 2-5
 7for page_num in range(1, 5):
 8    writer.add_page(reader.pages[page_num])
 9
10with open("output.pdf", "wb") as output:
11    writer.write(output)

Available packages

pdfplumber - Text and table extraction (recommended)
pypdf - PDF manipulation, merging, splitting
pdf2image - Convert PDFs to images (requires poppler)
pytesseract - OCR for scanned PDFs (requires tesseract)

Common patterns

Extract and save text:

1import pdfplumber
2
3with pdfplumber.open("input.pdf") as pdf:
4    text = "\n\n".join(page.extract_text() for page in pdf.pages)
5
6with open("output.txt", "w") as f:
7    f.write(text)

Extract tables to CSV:

 1import pdfplumber
 2import csv
 3
 4with pdfplumber.open("tables.pdf") as pdf:
 5    tables = pdf.pages[0].extract_tables()
 6
 7    with open("output.csv", "w", newline="") as f:
 8        writer = csv.writer(f)
 9        for table in tables:
10            writer.writerows(table)

Error handling

Handle common PDF issues:

 1import pdfplumber
 2
 3try:
 4    with pdfplumber.open("document.pdf") as pdf:
 5        if len(pdf.pages) == 0:
 6            print("PDF has no pages")
 7        else:
 8            text = pdf.pages[0].extract_text()
 9            if text is None or text.strip() == "":
10                print("Page contains no extractable text (might be scanned)")
11            else:
12                print(text)
13except Exception as e:
14    print(f"Error processing PDF: {e}")