PDFy

PDFy: PDF Malware Analysis API

Fast, comprehensive PDF malware scanning with instant results

Overview

PDFy is a powerful web service for analyzing PDF files for malicious content. It provides multiple interfaces (REST API, Web UI, TUI) for scanning PDFs and detecting malware, suspicious keywords, embedded scripts, and other security threats.

Key Features

Documentation

Core Documentation

Document Description
Product Vision Product scope, goals, and release intent
System Architecture High-level system boundaries and components
Analysis Pipeline Scan lifecycle and analysis stages

API & Development

Document Description
API Contracts REST API surface and payload definitions
Scan Result Schema Result data structures
Design Spec Product design and implementation details
Production Plan MVP implementation roadmap

Operations & Security

Document Description
Privacy & Security Data retention, deletion, and privacy policies
Deployment Guide Environment setup and deployment
ADR Decisions Technical architecture decisions

Installation

# Install dependencies
pip install -e ./services/analyzer

# Install API dependencies
pip install -e ./apps/api

Quick Start

Start the API Server

cd apps/api
uvicorn app.main:app --host 0.0.0.0 --port 8000

Usage Options

1. Web Interface

Open apps/web/index.html in your browser for a user-friendly PDF analysis interface.

2. REST API

# Analyze a PDF file
curl -X POST http://localhost:8000/analyze/fast \
  -F "file=@sample.pdf"

3. Terminal Interface (TUI)

# If TUI is implemented
python -m apps.tui.main

API Reference

Endpoints

Endpoint Method Description
/analyze/fast POST Quick analysis with keyword/IOC detection
/analyze/deep POST Deep scan with PDF metadata and JavaScript analysis
/health GET API health check

Response Format

{
  "file_name": "document.pdf",
  "sha256": "abc123...",
  "keyword_hits": ["Suspicious keyword"],
  "iocs": {
    "urls": ["http://example.com"],
    "ips": ["192.168.1.1"],
    "emails": ["test@example.com"]
  },
  "summary": {
    "verdict": "suspicious",
    "score": 65,
    "confidence": "high"
  }
}

Testing

# Run tests
pytest

# Run with coverage
pytest --cov

Project Structure

PDFy/
├── apps/
│   ├── api/          # FastAPI REST API
│   ├── web/         # Web interface (HTML/JS)
│   └── tui/         # Terminal interface
├── services/
│   └── analyzer/    # PDF analysis engine
├── docs/            # Documentation
└── README.md        # This file

Security

See Privacy & Security Documentation for:

License

See project repositories for licensing information.