Skip to main content

Paperless-ngx (Document Management System)

What this is

Self-hosted document management system for scanning, storing, indexing, and searching documents using OCR.


Why I set this up

To digitise and organise documents (bills, letters, etc.) and make them searchable using OCR.


Where it runs

  • Host: micropc

  • Local access: http://192.168.86.100:8456

  • External access: Reverse proxy available (password-protected and Cloudflare Tunnel secured)

Note: The actual URL is intentionally not shared publicly for security reasons.


Docker Compose

services: 
  broker:
    image: docker.io/library/redis:7
    container_name: paperless-redis
    restart: unless-stopped
    volumes:
      - ./redis:/data

  db:
    image: docker.io/library/postgres:16
    container_name: paperless-db
    restart: unless-stopped
    volumes:
      - ./db:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: <REDACTED>

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    container_name: paperless
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - "8456:8000"
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998
      PAPERLESS_DBPASS: <REDACTED>
    labels:
      - diun.enable=true

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.7
    container_name: paperless-gotenberg
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: docker.io/apache/tika:latest
    container_name: paperless-tika
    restart: unless-stopped
    labels:
      - diun.enable=true

Environment Configuration

USERMAP_UID=1000
USERMAP_GID=1000

PAPERLESS_URL=<REDACTED>
PAPERLESS_SECRET_KEY=<REDACTED>

PAPERLESS_TIME_ZONE=America/Santo_Domingo

PAPERLESS_TASK_WORKERS=4
PAPERLESS_THREADS_PER_WORKER=2

Key Configuration Notes

Services Overview

  • Redis (broker) → background task queue

  • PostgreSQL (db) → database

  • Webserver → main application

  • Gotenberg → document conversion (PDF, email, etc.)

  • Tika → text extraction for OCR


Networking

  • Runs on port 8456 locally

  • Internal services communicate via Docker network (service names)

  • Can be exposed externally via secured reverse proxy


Storage

  • ./data → application data

  • ./media → stored documents

  • ./consume → auto-import directory

  • ./export → exported files


OCR & Processing

  • Uses Tika (text extraction) + Gotenberg (document conversion)

  • Supports automatic document ingestion via /consume


Permissions

  • Controlled via USERMAP_UID and USERMAP_GID

  • Must match host filesystem permissions


Performance

  • Workers: 4

  • Threads per worker: 2

  • Adjustable depending on system load


Setup Steps

  1. Create directories:

    • data, media, consume, export, db, redis

  2. Create .env file with required variables

  3. Start container:

    docker compose up -d
    
  4. Open web UI locally: http://192.168.86.100:8456

  5. Complete initial setup in browser


Problems / Fixes

Documents not processing

  • Check Redis and worker status

  • Ensure all services are running (docker ps)

OCR not working

  • Verify Tika and Gotenberg containers are running

  • Check logs via Dozzle

Permission issues

  • Files not appearing or importing

  • Fix:

    • Ensure UID/GID matches host user

    • Check folder ownership

Reverse proxy access

  • External access secured via password + Cloudflare Tunnel

  • Never expose sensitive credentials publicly


Result

  • Fully functional document management system

  • Documents automatically processed and indexed

  • OCR working for searchable content

  • Accessible locally and securely via reverse proxy


Notes

  • Multi-container setup (more complex than most)

  • Requires all dependent services to be healthy

  • Monitored via Dozzle

  • Updates tracked via DIUN