Rawdigs Audio Service¶

Overview¶

The Rawdigs Audio Service is a dedicated Python-based microservice designed to handle audio file processing for the Rawdigs platform. It works alongside the main Laravel application by offloading computationally intensive audio processing tasks such as transcoding, waveform generation, loudness normalization, and audio analysis. The service listens to media upload events via RabbitMQ, processes the audio files stored in Cloudflare R2, and publishes processed results back for consumption by the Laravel app and frontend clients.

This architecture allows the Laravel app to remain responsive while delegating heavy audio processing to a scalable, asynchronous Python microservice.

Architecture Diagram¶

graph LR
    A[Laravel App] -->|Publishes media.uploaded| B[RabbitMQ]
    B -->|Consumes media.uploaded| C[Python Audio Service]
    C -->|Reads/Writes| D[Cloudflare R2 Storage]
    C -->|Publishes media.processed| B
    B -->|Consumes media.processed| A
    A -->|Serves| E[React Frontend]
    D -->|Serves Audio/Artwork| F[CDN]

Components¶

Component	Description
Laravel	Main backend app handling user requests & API
RabbitMQ	Message broker for event-driven communication
Python	Audio microservice using FastAPI & Celery
Cloudflare R2	Object storage for raw and processed audio/artwork
CDN	Content Delivery Network for streaming/downloads
React	Frontend UI consuming processed audio data

Python Stack¶

Library/Tool	Purpose
FastAPI	REST API framework for service endpoints
Celery	Distributed task queue for asynchronous jobs
FFmpeg	Audio/video transcoding and format conversion
librosa	Audio analysis and feature extraction
pyloudnorm	Loudness normalization (EBU R128)
Essentia	Audio analysis library (C++ core + Python bindings)
mutagen	Audio metadata reading and writing
boto3	AWS SDK used for Cloudflare R2 S3-compatible API

Pro‑Grade Stack & Options (Recommended)¶

This section refines codec, library, container, and monitoring choices for production-grade audio processing at scale.

Codec & Streaming Matrix¶

Format	Compatibility	Pros	Cons	Notes
HLS (AAC-LC)	All browsers (esp. iOS/Safari)	Best UX, cacheable small segments, ABR	AAC encode quality varies by encoder; segment overhead	Use audio-only HLS; 64/128/192 kbps; 2–4s segments; prefer fMP4 if gapless needed
Progressive (MP3 320)	Ubiquitous	Simple, robust previews/downloads	No ABR; larger files	ID3v2.3/2.4 tags; embed cover
Progressive (Opus)	Excellent on desktop/Android; not in iOS Safari	Highest quality/bitrate	Apple Safari gap	Offer as optional progressive or downloads.
Lossless (FLAC, WAV/AIFF)	FLAC widely supported; WAV universal	Archival/mastering grade	Large files	Transcode from master; do not cascade from lossy.

Encoders & Licensing¶

FFmpeg with libfdk_aac (best AAC quality). If unavailable, use FFmpeg’s native aac at -q:a 2–3 or -b:a 128/192k.
Opus (libopus) for progressive or downloads; do not use for HLS targeting iOS.
MP3 (libmp3lame) at 320 kbps for downloads/previews.
FLAC at compression level 8 for lossless.
Check codec patent/licensing in your jurisdiction; avoid distributing encoders with incompatible licenses in closed builds.

Loudness & Metering¶

Recommend pyloudnorm (EBU R128) for integrated LUFS/True Peak; note FFmpeg ebur128 filter as a cross-check.
Suggested targets: streaming playback normalization around −14 LUFS (track) with true-peak ceiling −1.0 dBFS; allow creator opt‑out; store both track and album gain.

Waveform Generation¶

Recommend multi‑resolution min/max bins (10/40/160 ms) normalized to [−1,+1]; include hopSec in JSON.
Store gzip-compressed JSON; cache long TTL via CDN.

Analysis Libraries¶

Library	Strengths	Trade‑offs	Use When
Essentia	Studio‑grade tonal/tempo/MIR	Heavier install; larger image	You need reliable key/tempo or MIR descriptors
librosa	Lightweight, Pythonic, fast prototyping	Not as accurate for tonal tasks as Essentia	General analysis, waveform bins
pyloudnorm	Standards‑based LUFS/TP	CPU cost on long files	Consistent loudness measurement
torchaudio/onnxruntime (optional)	ML models, embeddings	Adds CUDA/ONNX deps	Similarity search, classification

Containers & Build¶

Base: python:3.11-slim + ffmpeg (custom build enabling libfdk_aac, libopus, libmp3lame, libvpx optional).
Separate “deep-analysis” image with Essentia to keep main worker light.
Multi‑stage Docker builds; cache wheels; non-root user; PYTHONDONTWRITEBYTECODE=1.

Monitoring & SRE¶

Prometheus: Celery task durations/counts, RabbitMQ exporter, FFmpeg exit codes.
Logs: structured JSON including trace_id, media_id, job_kind.
Alerts: DLQ depth & consumer lag, error rate & P95 task duration, low CDN hit ratio.

Python Frameworks & Libraries — Deep Dive¶

This section provides a comparative overview and best-practice guidance for the core Python frameworks and libraries powering the Rawdigs Audio Service.

Service/API Frameworks¶

Framework	Startup Speed	Type Safety / Validation	Async Support	Ecosystem	When to Choose
FastAPI	Fast	Pydantic (strong)	Native	Modern, growing	Modern async APIs, OpenAPI docs, type-checked contracts
Flask (+ Flask-API)	Very fast	Marshmallow/None	Limited (via extensions)	Huge, mature	Minimal APIs, quick prototyping, legacy codebases
Django REST Framework	Slower	Django models/serializers	No (WSGI)	Very large	When you need Django ORM/admin, or complex auth flows

Background Jobs & Messaging¶

Tool	Broker(s)	Strengths	Weak Spots	Best Fit
Celery	RabbitMQ, Redis	Mature, flexible, periodic, ETA/countdown	Complex config, heavy worker	Large, mixed workloads; scheduled jobs
Dramatiq	RabbitMQ, Redis	Simpler config, asyncio support	Fewer features than Celery	Async jobs, simpler task graphs
RQ	Redis	Super simple, lightweight	No RabbitMQ, fewer features	Small/simple jobs, Redis-only stacks
Faust/Kafka	Kafka	Stream processing, windowed jobs	Not for classic queue tasks	Real-time pipelines, streaming events

RabbitMQ Clients¶

aio-pika: Asyncio-native, best for FastAPI async consumers/producers.
pika: Classic, blocking; use for simple scripts or sync code.
kombu: Abstraction used by Celery; for advanced multi-broker needs.

Media DSP & Analysis¶

ffmpeg via subprocess: Full control, robust error handling; recommended for production (can parse stderr, manage temp files, avoid GIL issues).
ffmpeg-python: Pythonic wrapper; quick for prototyping, but less transparent for debugging. Use only for simple pipelines.
pyloudnorm: EBU R128 LUFS/True Peak metering; pure Python, reliable for normalization targets.
librosa: Fast waveform binning, onset/tempo, mel/chroma; tip: resample once, process in chunks for large files.
Essentia: Studio-grade tonal/tempo/MIR; install in a separate Docker image to keep main worker slim.
Optional ML: demucs/spleeter (source separation), torchaudio (embeddings), onnxruntime (classification); beware large images and GPU/ONNX dependencies.

Storage & I/O¶

boto3: Official AWS SDK; robust, blocking. Good for most tasks.
aioboto3/aiobotocore: Async wrappers; use for concurrent downloads/uploads in async code.
smart_open: Unified file interface for S3, GCS, local, etc.; handy for streaming large objects.
For large uploads: Use multipart transfer (boto3.s3.transfer.TransferConfig); stream to/from disk to avoid memory spikes.

Config & Validation¶

pydantic v2: Modern, fast, strict validation and parsing for request/response models and internal configs.
pydantic-settings: Manage settings from .env/environment; 12-factor ready.
Store secrets/config in environment or .env (never in code).

Observability¶

Metrics: prometheus-client for custom metrics; RabbitMQ and Celery exporters for queue/task stats; count FFmpeg exit codes.
Tracing: opentelemetry-instrumentation-fastapi, opentelemetry-instrumentation-celery, OTLP exporter for distributed traces.
Logging: Use structlog or loguru with JSON output. Always include fields: trace_id, media_id, job_kind, artifact_key.
Error reporting: Integrate sentry-sdk for alerting and stack traces.

Performance & Concurrency¶

Use uvicorn with --http httptools --loop uvloop for best FastAPI performance.
Separate I/O-bound (download/upload) from CPU-bound (FFmpeg/analysis) worker types for resource efficiency.
Prefer process pools (e.g., Celery prefork) for CPU-heavy work; avoid running FFmpeg/Essentia in asyncio event loop.
Pin Celery's --prefetch-multiplier and RabbitMQ consumer prefetch_count to control job concurrency and back-pressure.
Use local ephemeral NVMe (/tmp) as scratch space for transcoding—avoid network disks for temp files.
For large files, use chunked S3 transfers and tune TransferConfig to optimize throughput and memory.

Testing & Quality¶

Use pytest, pytest-cases for test parametrization, and pytest-xdist for parallel test execution.
Maintain golden-sample fixtures: tiny WAV/FLAC files and expected JSON artifacts for regression tests.
Type check with mypy (strict mode); format with black and ruff; organize imports with isort.
Security: Run bandit for static analysis; check dependencies with pip-audit or safety.

Dependency Management¶

pip-tools: Simple, works with requirements.txt; good for conservative/prod envs.
poetry: Modern, lockfile support, manages virtualenvs; best for reproducible builds and publishing.

Minimal Poetry pyproject.toml skeleton:

[tool.poetry]
name = "rawdigs-audio-service"
version = "0.1.0"
description = "Audio processing microservice for Rawdigs"
authors = ["Your Name <your@email.com>"]

[tool.poetry.dependencies]
python = "^3.11"
fastapi = "^0.110"
uvicorn = {extras = ["standard"], version = "^0.29"}
celery = "^5.3"
boto3 = "^1.34"
pydantic = "^2.6"
pydantic-settings = "^2.2"
ffmpeg-python = "^0.2"
librosa = "^0.10"
pyloudnorm = "^0.1"
structlog = "^24.1"
prometheus-client = "^0.20"
sentry-sdk = "^1.45"

[tool.poetry.group.dev.dependencies]
pytest = "^8.2"
pytest-cases = "^3.8"
pytest-xdist = "^3.6"
mypy = "^1.10"
black = "^24.4"
ruff = "^0.4"
isort = "^5.13"
bandit = "^1.7"
pip-audit = "^2.7"

Deliverables¶

Deliverable	Description
Streaming	HLS AAC-LC 64/128/192 kbps; 2–4s segments; audio-only, TS or fMP4
Progressive Previews	MP3 320
Optional Progressive	Opus 160 kbps (note: not supported in iOS Safari)
Downloads	FLAC level 8; MP3 320; optional WAV/AIFF
Waveform JSON	Multi-resolution min/max
Analysis JSON	LUFS/TP, duration, optional tempo/key
Artwork	JPG/WEBP multiple sizes

Cloudflare R2 Layout Example¶

r2-raw/
  releases/<release_id>/masters/<track_id>.wav
r2-proc/
  tracks/<track_id>/
    hls/master.m3u8
    hls/128k/seg-00001.ts
    hls/192k/seg-00001.ts
    waveform.json
    analysis.json
    cover_600.jpg
  downloads/<release_id>/
    FLAC/<track_num> - <title>.flac
    MP3_320/<track_num> - <title>.mp3

Message Schemas¶

`media.uploaded` Event Example¶

{
  "v": 1,
  "event": "media.uploaded",
  "trace_id": "abc123def456",
  "idempotency_key": "upload-1234567890",
  "requested_outputs": ["hls", "waveform", "analysis", "downloads"],
  "data": {
    "media_id": "1234567890",
    "user_id": "user_42",
    "file_path": "rawdigs-audio/raw/1234567890.wav",
    "timestamp": "2024-06-01T12:00:00Z"
  }
}

`media.processed` Event Example¶

{
  "v": 1,
  "event": "media.processed",
  "trace_id": "abc123def456",
  "idempotency_key": "process-1234567890",
  "requested_outputs": ["hls", "waveform", "analysis", "downloads"],
  "data": {
    "media_id": "1234567890",
    "formats": [
      "mp3_320",
      "aac",
      "opus"
    ],
    "artifacts": {
      "hls": "rawdigs-audio/processed/1234567890/hls/master.m3u8",
      "waveform": "rawdigs-audio/processed/1234567890/waveform.json",
      "analysis": "rawdigs-audio/processed/1234567890/analysis.json",
      "downloads": {
        "mp3_320": "rawdigs-audio/processed/1234567890/mp3_320.mp3",
        "flac": "rawdigs-audio/processed/1234567890/flac.flac"
      }
    },
    "timestamp": "2024-06-01T12:30:00Z"
  }
}

Operations Checklist¶

Idempotency by track_id/idempotency_key.
RabbitMQ: quorum queues, DLX, consumer prefetch 1–4.
Concurrency sizing: CPU-bound FFmpeg workers vs analysis workers.
Security: short‑TTL signed URLs; tokenized paths; CORS for upload origin.
Caching: long TTL for HLS segments/waveform, shorter for manifests.

Development Setup¶

Below is a simplified docker-compose.yml snippet for local development:

version: '3.8'
services:
  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "5672:5672"
      - "15672:15672"

  audio-service:
    build: .
    environment:
      - RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/
      - R2_ENDPOINT=https://<account_id>.r2.cloudflarestorage.com
      - R2_ACCESS_KEY=<your-access-key>
      - R2_SECRET_KEY=<your-secret-key>
    depends_on:
      - rabbitmq
    volumes:
      - ./audio_files:/app/audio_files
    ports:
      - "8000:8000"

Example Celery Tasks¶

NOTE: ffmpeg may be called via subprocess for finer control; ffmpeg-python is optional.¶

from celery import Celery import ffmpeg import librosa import pyloudnorm as pyln import boto3 import json

app = Celery('audio_tasks', broker='amqp://guest@rabbitmq//')

s3_client = boto3.client('s3', endpoint_url='https://.r2.cloudflarestorage.com', aws_access_key_id='', aws_secret_access_key='')

@app.task def transcode_audio(media_id, input_key): # Download raw audio from R2 s3_client.download_file('rawdigs-audio', input_key, f'/tmp/{media_id}.wav')

# Transcode to mp3 320kbps
ffmpeg.input(f'/tmp/{media_id}.wav').output(f'/tmp/{media_id}_320.mp3', audio_bitrate='320k').run()

# Upload transcoded file
s3_client.upload_file(f'/tmp/{media_id}_320.mp3', 'rawdigs-audio', f'processed/{media_id}/mp3_320.mp3')

@app.task def analyze_audio(media_id, input_key): y, sr = librosa.load(f'/tmp/{media_id}.wav', sr=None)

# Loudness normalization
meter = pyln.Meter(sr)
loudness = meter.integrated_loudness(y)

# Tempo estimation
tempo, _ = librosa.beat.beat_track(y=y, sr=sr)

analysis = {
    "loudness": loudness,
    "tempo": tempo,
    "sample_rate": sr
}

with open(f'/tmp/{media_id}_analysis.json', 'w') as f:
    json.dump(analysis, f)

s3_client.upload_file(f'/tmp/{media_id}_analysis.json', 'rawdigs-audio', f'processed/{media_id}/analysis.json')

```

Waveform JSON Example¶

json { "duration": 180.5, "samples": [0, 0.02, 0.1, 0.15, 0.12, 0.05, 0, -0.03, -0.1, -0.12, -0.1, 0, ...] }

Analysis JSON Example¶

{
  "loudness": -14.3,
  "tempo": 120.5,
  "key": "C Major",
  "spectral_centroid_mean": 3500.5,
  "spectral_bandwidth_mean": 1500.2
}

Essentia Explanation¶

Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval. It provides a comprehensive set of algorithms for audio feature extraction, classification, segmentation, and more. The library offers Python bindings, enabling easy integration into Python projects like the Rawdigs Audio Service. Essentia is used here to extract detailed audio features that go beyond basic statistics, enriching the analysis JSON with musicological insights.

Install via pip install essentia where wheels are available, or base your Docker image on the official Essentia image for reliability.

Future Roadmap¶

Add support for more audio formats and streaming protocols (e.g., HLS, DASH)
Implement real-time audio processing pipelines
Enhance waveform data with multi-resolution and zoomable formats
Integrate machine learning models for genre and mood classification
Improve scalability with Kubernetes and autoscaling workers
Add user-configurable processing profiles
Expand metadata extraction with automatic tagging

This project is licensed under the MIT License.
Related repositories:
rawdigs-laravel - Main Laravel backend
rawdigs-frontend - React frontend
rawdigs-audio-service - This audio processing microservice

For detailed contribution guidelines, see the CONTRIBUTING.md in each repository.