Skip to content

Rawdigs Audio Service

Overview

The Rawdigs Audio Service is a dedicated Python-based microservice designed to handle audio file processing for the Rawdigs platform. It works alongside the main Laravel application by offloading computationally intensive audio processing tasks such as transcoding, waveform generation, loudness normalization, and audio analysis. The service listens to media upload events via RabbitMQ, processes the audio files stored in Cloudflare R2, and publishes processed results back for consumption by the Laravel app and frontend clients.

This architecture allows the Laravel app to remain responsive while delegating heavy audio processing to a scalable, asynchronous Python microservice.


Architecture Diagram

graph LR
    A[Laravel App] -->|Publishes media.uploaded| B[RabbitMQ]
    B -->|Consumes media.uploaded| C[Python Audio Service]
    C -->|Reads/Writes| D[Cloudflare R2 Storage]
    C -->|Publishes media.processed| B
    B -->|Consumes media.processed| A
    A -->|Serves| E[React Frontend]
    D -->|Serves Audio/Artwork| F[CDN]

Components

Component Description
Laravel Main backend app handling user requests & API
RabbitMQ Message broker for event-driven communication
Python Audio microservice using FastAPI & Celery
Cloudflare R2 Object storage for raw and processed audio/artwork
CDN Content Delivery Network for streaming/downloads
React Frontend UI consuming processed audio data

Python Stack

Library/Tool Purpose
FastAPI REST API framework for service endpoints
Celery Distributed task queue for asynchronous jobs
FFmpeg Audio/video transcoding and format conversion
librosa Audio analysis and feature extraction
pyloudnorm Loudness normalization (EBU R128)
Essentia Audio analysis library (C++ core + Python bindings)
mutagen Audio metadata reading and writing
boto3 AWS SDK used for Cloudflare R2 S3-compatible API

This section refines codec, library, container, and monitoring choices for production-grade audio processing at scale.

Codec & Streaming Matrix

Format Compatibility Pros Cons Notes
HLS (AAC-LC) All browsers (esp. iOS/Safari) Best UX, cacheable small segments, ABR AAC encode quality varies by encoder; segment overhead Use audio-only HLS; 64/128/192 kbps; 2–4s segments; prefer fMP4 if gapless needed
Progressive (MP3 320) Ubiquitous Simple, robust previews/downloads No ABR; larger files ID3v2.3/2.4 tags; embed cover
Progressive (Opus) Excellent on desktop/Android; not in iOS Safari Highest quality/bitrate Apple Safari gap Offer as optional progressive or downloads.
Lossless (FLAC, WAV/AIFF) FLAC widely supported; WAV universal Archival/mastering grade Large files Transcode from master; do not cascade from lossy.

Encoders & Licensing

  • FFmpeg with libfdk_aac (best AAC quality). If unavailable, use FFmpeg’s native aac at -q:a 2–3 or -b:a 128/192k.
  • Opus (libopus) for progressive or downloads; do not use for HLS targeting iOS.
  • MP3 (libmp3lame) at 320 kbps for downloads/previews.
  • FLAC at compression level 8 for lossless.
  • Check codec patent/licensing in your jurisdiction; avoid distributing encoders with incompatible licenses in closed builds.

Loudness & Metering

  • Recommend pyloudnorm (EBU R128) for integrated LUFS/True Peak; note FFmpeg ebur128 filter as a cross-check.
  • Suggested targets: streaming playback normalization around −14 LUFS (track) with true-peak ceiling −1.0 dBFS; allow creator opt‑out; store both track and album gain.

Waveform Generation

  • Recommend multi‑resolution min/max bins (10/40/160 ms) normalized to [−1,+1]; include hopSec in JSON.
  • Store gzip-compressed JSON; cache long TTL via CDN.

Analysis Libraries

Library Strengths Trade‑offs Use When
Essentia Studio‑grade tonal/tempo/MIR Heavier install; larger image You need reliable key/tempo or MIR descriptors
librosa Lightweight, Pythonic, fast prototyping Not as accurate for tonal tasks as Essentia General analysis, waveform bins
pyloudnorm Standards‑based LUFS/TP CPU cost on long files Consistent loudness measurement
torchaudio/onnxruntime (optional) ML models, embeddings Adds CUDA/ONNX deps Similarity search, classification

Containers & Build

  • Base: python:3.11-slim + ffmpeg (custom build enabling libfdk_aac, libopus, libmp3lame, libvpx optional).
  • Separate “deep-analysis” image with Essentia to keep main worker light.
  • Multi‑stage Docker builds; cache wheels; non-root user; PYTHONDONTWRITEBYTECODE=1.

Monitoring & SRE

  • Prometheus: Celery task durations/counts, RabbitMQ exporter, FFmpeg exit codes.
  • Logs: structured JSON including trace_id, media_id, job_kind.
  • Alerts: DLQ depth & consumer lag, error rate & P95 task duration, low CDN hit ratio.

Python Frameworks & Libraries — Deep Dive

This section provides a comparative overview and best-practice guidance for the core Python frameworks and libraries powering the Rawdigs Audio Service.

Service/API Frameworks

Framework Startup Speed Type Safety / Validation Async Support Ecosystem When to Choose
FastAPI Fast Pydantic (strong) Native Modern, growing Modern async APIs, OpenAPI docs, type-checked contracts
Flask (+ Flask-API) Very fast Marshmallow/None Limited (via extensions) Huge, mature Minimal APIs, quick prototyping, legacy codebases
Django REST Framework Slower Django models/serializers No (WSGI) Very large When you need Django ORM/admin, or complex auth flows

Background Jobs & Messaging

Tool Broker(s) Strengths Weak Spots Best Fit
Celery RabbitMQ, Redis Mature, flexible, periodic, ETA/countdown Complex config, heavy worker Large, mixed workloads; scheduled jobs
Dramatiq RabbitMQ, Redis Simpler config, asyncio support Fewer features than Celery Async jobs, simpler task graphs
RQ Redis Super simple, lightweight No RabbitMQ, fewer features Small/simple jobs, Redis-only stacks
Faust/Kafka Kafka Stream processing, windowed jobs Not for classic queue tasks Real-time pipelines, streaming events

RabbitMQ Clients

  • aio-pika: Asyncio-native, best for FastAPI async consumers/producers.
  • pika: Classic, blocking; use for simple scripts or sync code.
  • kombu: Abstraction used by Celery; for advanced multi-broker needs.

Media DSP & Analysis

  • ffmpeg via subprocess: Full control, robust error handling; recommended for production (can parse stderr, manage temp files, avoid GIL issues).
  • ffmpeg-python: Pythonic wrapper; quick for prototyping, but less transparent for debugging. Use only for simple pipelines.
  • pyloudnorm: EBU R128 LUFS/True Peak metering; pure Python, reliable for normalization targets.
  • librosa: Fast waveform binning, onset/tempo, mel/chroma; tip: resample once, process in chunks for large files.
  • Essentia: Studio-grade tonal/tempo/MIR; install in a separate Docker image to keep main worker slim.
  • Optional ML: demucs/spleeter (source separation), torchaudio (embeddings), onnxruntime (classification); beware large images and GPU/ONNX dependencies.

Storage & I/O

  • boto3: Official AWS SDK; robust, blocking. Good for most tasks.
  • aioboto3/aiobotocore: Async wrappers; use for concurrent downloads/uploads in async code.
  • smart_open: Unified file interface for S3, GCS, local, etc.; handy for streaming large objects.
  • For large uploads: Use multipart transfer (boto3.s3.transfer.TransferConfig); stream to/from disk to avoid memory spikes.

Config & Validation

  • pydantic v2: Modern, fast, strict validation and parsing for request/response models and internal configs.
  • pydantic-settings: Manage settings from .env/environment; 12-factor ready.
  • Store secrets/config in environment or .env (never in code).

Observability

  • Metrics: prometheus-client for custom metrics; RabbitMQ and Celery exporters for queue/task stats; count FFmpeg exit codes.
  • Tracing: opentelemetry-instrumentation-fastapi, opentelemetry-instrumentation-celery, OTLP exporter for distributed traces.
  • Logging: Use structlog or loguru with JSON output. Always include fields: trace_id, media_id, job_kind, artifact_key.
  • Error reporting: Integrate sentry-sdk for alerting and stack traces.

Performance & Concurrency

  • Use uvicorn with --http httptools --loop uvloop for best FastAPI performance.
  • Separate I/O-bound (download/upload) from CPU-bound (FFmpeg/analysis) worker types for resource efficiency.
  • Prefer process pools (e.g., Celery prefork) for CPU-heavy work; avoid running FFmpeg/Essentia in asyncio event loop.
  • Pin Celery's --prefetch-multiplier and RabbitMQ consumer prefetch_count to control job concurrency and back-pressure.
  • Use local ephemeral NVMe (/tmp) as scratch space for transcoding—avoid network disks for temp files.
  • For large files, use chunked S3 transfers and tune TransferConfig to optimize throughput and memory.

Testing & Quality

  • Use pytest, pytest-cases for test parametrization, and pytest-xdist for parallel test execution.
  • Maintain golden-sample fixtures: tiny WAV/FLAC files and expected JSON artifacts for regression tests.
  • Type check with mypy (strict mode); format with black and ruff; organize imports with isort.
  • Security: Run bandit for static analysis; check dependencies with pip-audit or safety.

Dependency Management

  • pip-tools: Simple, works with requirements.txt; good for conservative/prod envs.
  • poetry: Modern, lockfile support, manages virtualenvs; best for reproducible builds and publishing.

Minimal Poetry pyproject.toml skeleton:

[tool.poetry]
name = "rawdigs-audio-service"
version = "0.1.0"
description = "Audio processing microservice for Rawdigs"
authors = ["Your Name <your@email.com>"]

[tool.poetry.dependencies]
python = "^3.11"
fastapi = "^0.110"
uvicorn = {extras = ["standard"], version = "^0.29"}
celery = "^5.3"
boto3 = "^1.34"
pydantic = "^2.6"
pydantic-settings = "^2.2"
ffmpeg-python = "^0.2"
librosa = "^0.10"
pyloudnorm = "^0.1"
structlog = "^24.1"
prometheus-client = "^0.20"
sentry-sdk = "^1.45"

[tool.poetry.group.dev.dependencies]
pytest = "^8.2"
pytest-cases = "^3.8"
pytest-xdist = "^3.6"
mypy = "^1.10"
black = "^24.4"
ruff = "^0.4"
isort = "^5.13"
bandit = "^1.7"
pip-audit = "^2.7"


Deliverables

Deliverable Description
Streaming HLS AAC-LC 64/128/192 kbps; 2–4s segments; audio-only, TS or fMP4
Progressive Previews MP3 320
Optional Progressive Opus 160 kbps (note: not supported in iOS Safari)
Downloads FLAC level 8; MP3 320; optional WAV/AIFF
Waveform JSON Multi-resolution min/max
Analysis JSON LUFS/TP, duration, optional tempo/key
Artwork JPG/WEBP multiple sizes

Cloudflare R2 Layout Example

r2-raw/
  releases/<release_id>/masters/<track_id>.wav
r2-proc/
  tracks/<track_id>/
    hls/master.m3u8
    hls/128k/seg-00001.ts
    hls/192k/seg-00001.ts
    waveform.json
    analysis.json
    cover_600.jpg
  downloads/<release_id>/
    FLAC/<track_num> - <title>.flac
    MP3_320/<track_num> - <title>.mp3

Message Schemas

media.uploaded Event Example

{
  "v": 1,
  "event": "media.uploaded",
  "trace_id": "abc123def456",
  "idempotency_key": "upload-1234567890",
  "requested_outputs": ["hls", "waveform", "analysis", "downloads"],
  "data": {
    "media_id": "1234567890",
    "user_id": "user_42",
    "file_path": "rawdigs-audio/raw/1234567890.wav",
    "timestamp": "2024-06-01T12:00:00Z"
  }
}

media.processed Event Example

{
  "v": 1,
  "event": "media.processed",
  "trace_id": "abc123def456",
  "idempotency_key": "process-1234567890",
  "requested_outputs": ["hls", "waveform", "analysis", "downloads"],
  "data": {
    "media_id": "1234567890",
    "formats": [
      "mp3_320",
      "aac",
      "opus"
    ],
    "artifacts": {
      "hls": "rawdigs-audio/processed/1234567890/hls/master.m3u8",
      "waveform": "rawdigs-audio/processed/1234567890/waveform.json",
      "analysis": "rawdigs-audio/processed/1234567890/analysis.json",
      "downloads": {
        "mp3_320": "rawdigs-audio/processed/1234567890/mp3_320.mp3",
        "flac": "rawdigs-audio/processed/1234567890/flac.flac"
      }
    },
    "timestamp": "2024-06-01T12:30:00Z"
  }
}

Operations Checklist

  • Idempotency by track_id/idempotency_key.
  • RabbitMQ: quorum queues, DLX, consumer prefetch 1–4.
  • Concurrency sizing: CPU-bound FFmpeg workers vs analysis workers.
  • Security: short‑TTL signed URLs; tokenized paths; CORS for upload origin.
  • Caching: long TTL for HLS segments/waveform, shorter for manifests.

Development Setup

Below is a simplified docker-compose.yml snippet for local development:

version: '3.8'
services:
  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "5672:5672"
      - "15672:15672"

  audio-service:
    build: .
    environment:
      - RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/
      - R2_ENDPOINT=https://<account_id>.r2.cloudflarestorage.com
      - R2_ACCESS_KEY=<your-access-key>
      - R2_SECRET_KEY=<your-secret-key>
    depends_on:
      - rabbitmq
    volumes:
      - ./audio_files:/app/audio_files
    ports:
      - "8000:8000"

Example Celery Tasks

NOTE: ffmpeg may be called via subprocess for finer control; ffmpeg-python is optional.

from celery import Celery import ffmpeg import librosa import pyloudnorm as pyln import boto3 import json

app = Celery('audio_tasks', broker='amqp://guest@rabbitmq//')

s3_client = boto3.client('s3', endpoint_url='https://.r2.cloudflarestorage.com', aws_access_key_id='', aws_secret_access_key='')

@app.task def transcode_audio(media_id, input_key): # Download raw audio from R2 s3_client.download_file('rawdigs-audio', input_key, f'/tmp/{media_id}.wav')

# Transcode to mp3 320kbps
ffmpeg.input(f'/tmp/{media_id}.wav').output(f'/tmp/{media_id}_320.mp3', audio_bitrate='320k').run()

# Upload transcoded file
s3_client.upload_file(f'/tmp/{media_id}_320.mp3', 'rawdigs-audio', f'processed/{media_id}/mp3_320.mp3')

@app.task def analyze_audio(media_id, input_key): y, sr = librosa.load(f'/tmp/{media_id}.wav', sr=None)

# Loudness normalization
meter = pyln.Meter(sr)
loudness = meter.integrated_loudness(y)

# Tempo estimation
tempo, _ = librosa.beat.beat_track(y=y, sr=sr)

analysis = {
    "loudness": loudness,
    "tempo": tempo,
    "sample_rate": sr
}

with open(f'/tmp/{media_id}_analysis.json', 'w') as f:
    json.dump(analysis, f)

s3_client.upload_file(f'/tmp/{media_id}_analysis.json', 'rawdigs-audio', f'processed/{media_id}/analysis.json')

```


Waveform JSON Example

json { "duration": 180.5, "samples": [0, 0.02, 0.1, 0.15, 0.12, 0.05, 0, -0.03, -0.1, -0.12, -0.1, 0, ...] }

Analysis JSON Example

{
  "loudness": -14.3,
  "tempo": 120.5,
  "key": "C Major",
  "spectral_centroid_mean": 3500.5,
  "spectral_bandwidth_mean": 1500.2
}

Essentia Explanation

Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval. It provides a comprehensive set of algorithms for audio feature extraction, classification, segmentation, and more. The library offers Python bindings, enabling easy integration into Python projects like the Rawdigs Audio Service. Essentia is used here to extract detailed audio features that go beyond basic statistics, enriching the analysis JSON with musicological insights.

Install via pip install essentia where wheels are available, or base your Docker image on the official Essentia image for reliability.


Future Roadmap

  • Add support for more audio formats and streaming protocols (e.g., HLS, DASH)
  • Implement real-time audio processing pipelines
  • Enhance waveform data with multi-resolution and zoomable formats
  • Integrate machine learning models for genre and mood classification
  • Improve scalability with Kubernetes and autoscaling workers
  • Add user-configurable processing profiles
  • Expand metadata extraction with automatic tagging

For detailed contribution guidelines, see the CONTRIBUTING.md in each repository.