Rawdigs Audio Service¶
Overview¶
The Rawdigs Audio Service is a dedicated Python-based microservice designed to handle audio file processing for the Rawdigs platform. It works alongside the main Laravel application by offloading computationally intensive audio processing tasks such as transcoding, waveform generation, loudness normalization, and audio analysis. The service listens to media upload events via RabbitMQ, processes the audio files stored in Cloudflare R2, and publishes processed results back for consumption by the Laravel app and frontend clients.
This architecture allows the Laravel app to remain responsive while delegating heavy audio processing to a scalable, asynchronous Python microservice.
Architecture Diagram¶
graph LR
A[Laravel App] -->|Publishes media.uploaded| B[RabbitMQ]
B -->|Consumes media.uploaded| C[Python Audio Service]
C -->|Reads/Writes| D[Cloudflare R2 Storage]
C -->|Publishes media.processed| B
B -->|Consumes media.processed| A
A -->|Serves| E[React Frontend]
D -->|Serves Audio/Artwork| F[CDN]
Components¶
| Component | Description |
|---|---|
| Laravel | Main backend app handling user requests & API |
| RabbitMQ | Message broker for event-driven communication |
| Python | Audio microservice using FastAPI & Celery |
| Cloudflare R2 | Object storage for raw and processed audio/artwork |
| CDN | Content Delivery Network for streaming/downloads |
| React | Frontend UI consuming processed audio data |
Python Stack¶
| Library/Tool | Purpose |
|---|---|
| FastAPI | REST API framework for service endpoints |
| Celery | Distributed task queue for asynchronous jobs |
| FFmpeg | Audio/video transcoding and format conversion |
| librosa | Audio analysis and feature extraction |
| pyloudnorm | Loudness normalization (EBU R128) |
| Essentia | Audio analysis library (C++ core + Python bindings) |
| mutagen | Audio metadata reading and writing |
| boto3 | AWS SDK used for Cloudflare R2 S3-compatible API |
Pro‑Grade Stack & Options (Recommended)¶
This section refines codec, library, container, and monitoring choices for production-grade audio processing at scale.
Codec & Streaming Matrix¶
| Format | Compatibility | Pros | Cons | Notes |
|---|---|---|---|---|
| HLS (AAC-LC) | All browsers (esp. iOS/Safari) | Best UX, cacheable small segments, ABR | AAC encode quality varies by encoder; segment overhead | Use audio-only HLS; 64/128/192 kbps; 2–4s segments; prefer fMP4 if gapless needed |
| Progressive (MP3 320) | Ubiquitous | Simple, robust previews/downloads | No ABR; larger files | ID3v2.3/2.4 tags; embed cover |
| Progressive (Opus) | Excellent on desktop/Android; not in iOS Safari | Highest quality/bitrate | Apple Safari gap | Offer as optional progressive or downloads. |
| Lossless (FLAC, WAV/AIFF) | FLAC widely supported; WAV universal | Archival/mastering grade | Large files | Transcode from master; do not cascade from lossy. |
Encoders & Licensing¶
- FFmpeg with libfdk_aac (best AAC quality). If unavailable, use FFmpeg’s native
aacat-q:a 2–3or-b:a 128/192k. - Opus (
libopus) for progressive or downloads; do not use for HLS targeting iOS. - MP3 (
libmp3lame) at 320 kbps for downloads/previews. - FLAC at compression level 8 for lossless.
- Check codec patent/licensing in your jurisdiction; avoid distributing encoders with incompatible licenses in closed builds.
Loudness & Metering¶
- Recommend
pyloudnorm(EBU R128) for integrated LUFS/True Peak; note FFmpegebur128filter as a cross-check. - Suggested targets: streaming playback normalization around −14 LUFS (track) with true-peak ceiling −1.0 dBFS; allow creator opt‑out; store both track and album gain.
Waveform Generation¶
- Recommend multi‑resolution min/max bins (10/40/160 ms) normalized to [−1,+1]; include hopSec in JSON.
- Store gzip-compressed JSON; cache long TTL via CDN.
Analysis Libraries¶
| Library | Strengths | Trade‑offs | Use When |
|---|---|---|---|
| Essentia | Studio‑grade tonal/tempo/MIR | Heavier install; larger image | You need reliable key/tempo or MIR descriptors |
| librosa | Lightweight, Pythonic, fast prototyping | Not as accurate for tonal tasks as Essentia | General analysis, waveform bins |
| pyloudnorm | Standards‑based LUFS/TP | CPU cost on long files | Consistent loudness measurement |
| torchaudio/onnxruntime (optional) | ML models, embeddings | Adds CUDA/ONNX deps | Similarity search, classification |
Containers & Build¶
- Base:
python:3.11-slim+ffmpeg(custom build enablinglibfdk_aac,libopus,libmp3lame,libvpxoptional). - Separate “deep-analysis” image with Essentia to keep main worker light.
- Multi‑stage Docker builds; cache wheels; non-root user;
PYTHONDONTWRITEBYTECODE=1.
Monitoring & SRE¶
- Prometheus: Celery task durations/counts, RabbitMQ exporter, FFmpeg exit codes.
- Logs: structured JSON including
trace_id,media_id,job_kind. - Alerts: DLQ depth & consumer lag, error rate & P95 task duration, low CDN hit ratio.
Python Frameworks & Libraries — Deep Dive¶
This section provides a comparative overview and best-practice guidance for the core Python frameworks and libraries powering the Rawdigs Audio Service.
Service/API Frameworks¶
| Framework | Startup Speed | Type Safety / Validation | Async Support | Ecosystem | When to Choose |
|---|---|---|---|---|---|
| FastAPI | Fast | Pydantic (strong) | Native | Modern, growing | Modern async APIs, OpenAPI docs, type-checked contracts |
| Flask (+ Flask-API) | Very fast | Marshmallow/None | Limited (via extensions) | Huge, mature | Minimal APIs, quick prototyping, legacy codebases |
| Django REST Framework | Slower | Django models/serializers | No (WSGI) | Very large | When you need Django ORM/admin, or complex auth flows |
Background Jobs & Messaging¶
| Tool | Broker(s) | Strengths | Weak Spots | Best Fit |
|---|---|---|---|---|
| Celery | RabbitMQ, Redis | Mature, flexible, periodic, ETA/countdown | Complex config, heavy worker | Large, mixed workloads; scheduled jobs |
| Dramatiq | RabbitMQ, Redis | Simpler config, asyncio support | Fewer features than Celery | Async jobs, simpler task graphs |
| RQ | Redis | Super simple, lightweight | No RabbitMQ, fewer features | Small/simple jobs, Redis-only stacks |
| Faust/Kafka | Kafka | Stream processing, windowed jobs | Not for classic queue tasks | Real-time pipelines, streaming events |
RabbitMQ Clients¶
- aio-pika: Asyncio-native, best for FastAPI async consumers/producers.
- pika: Classic, blocking; use for simple scripts or sync code.
- kombu: Abstraction used by Celery; for advanced multi-broker needs.
Media DSP & Analysis¶
ffmpegviasubprocess: Full control, robust error handling; recommended for production (can parse stderr, manage temp files, avoid GIL issues).ffmpeg-python: Pythonic wrapper; quick for prototyping, but less transparent for debugging. Use only for simple pipelines.pyloudnorm: EBU R128 LUFS/True Peak metering; pure Python, reliable for normalization targets.librosa: Fast waveform binning, onset/tempo, mel/chroma; tip: resample once, process in chunks for large files.Essentia: Studio-grade tonal/tempo/MIR; install in a separate Docker image to keep main worker slim.- Optional ML:
demucs/spleeter(source separation),torchaudio(embeddings),onnxruntime(classification); beware large images and GPU/ONNX dependencies.
Storage & I/O¶
boto3: Official AWS SDK; robust, blocking. Good for most tasks.aioboto3/aiobotocore: Async wrappers; use for concurrent downloads/uploads in async code.smart_open: Unified file interface for S3, GCS, local, etc.; handy for streaming large objects.- For large uploads: Use multipart transfer (
boto3.s3.transfer.TransferConfig); stream to/from disk to avoid memory spikes.
Config & Validation¶
pydantic v2: Modern, fast, strict validation and parsing for request/response models and internal configs.pydantic-settings: Manage settings from.env/environment; 12-factor ready.- Store secrets/config in environment or
.env(never in code).
Observability¶
- Metrics:
prometheus-clientfor custom metrics; RabbitMQ and Celery exporters for queue/task stats; count FFmpeg exit codes. - Tracing:
opentelemetry-instrumentation-fastapi,opentelemetry-instrumentation-celery, OTLP exporter for distributed traces. - Logging: Use
structlogorloguruwith JSON output. Always include fields:trace_id,media_id,job_kind,artifact_key. - Error reporting: Integrate
sentry-sdkfor alerting and stack traces.
Performance & Concurrency¶
- Use
uvicornwith--http httptools --loop uvloopfor best FastAPI performance. - Separate I/O-bound (download/upload) from CPU-bound (FFmpeg/analysis) worker types for resource efficiency.
- Prefer process pools (e.g., Celery prefork) for CPU-heavy work; avoid running FFmpeg/Essentia in asyncio event loop.
- Pin Celery's
--prefetch-multiplierand RabbitMQ consumerprefetch_countto control job concurrency and back-pressure. - Use local ephemeral NVMe (
/tmp) as scratch space for transcoding—avoid network disks for temp files. - For large files, use chunked S3 transfers and tune
TransferConfigto optimize throughput and memory.
Testing & Quality¶
- Use
pytest,pytest-casesfor test parametrization, andpytest-xdistfor parallel test execution. - Maintain golden-sample fixtures: tiny WAV/FLAC files and expected JSON artifacts for regression tests.
- Type check with
mypy(strict mode); format withblackandruff; organize imports withisort. - Security: Run
banditfor static analysis; check dependencies withpip-auditorsafety.
Dependency Management¶
pip-tools: Simple, works with requirements.txt; good for conservative/prod envs.poetry: Modern, lockfile support, manages virtualenvs; best for reproducible builds and publishing.
Minimal Poetry pyproject.toml skeleton:
[tool.poetry]
name = "rawdigs-audio-service"
version = "0.1.0"
description = "Audio processing microservice for Rawdigs"
authors = ["Your Name <your@email.com>"]
[tool.poetry.dependencies]
python = "^3.11"
fastapi = "^0.110"
uvicorn = {extras = ["standard"], version = "^0.29"}
celery = "^5.3"
boto3 = "^1.34"
pydantic = "^2.6"
pydantic-settings = "^2.2"
ffmpeg-python = "^0.2"
librosa = "^0.10"
pyloudnorm = "^0.1"
structlog = "^24.1"
prometheus-client = "^0.20"
sentry-sdk = "^1.45"
[tool.poetry.group.dev.dependencies]
pytest = "^8.2"
pytest-cases = "^3.8"
pytest-xdist = "^3.6"
mypy = "^1.10"
black = "^24.4"
ruff = "^0.4"
isort = "^5.13"
bandit = "^1.7"
pip-audit = "^2.7"
Deliverables¶
| Deliverable | Description |
|---|---|
| Streaming | HLS AAC-LC 64/128/192 kbps; 2–4s segments; audio-only, TS or fMP4 |
| Progressive Previews | MP3 320 |
| Optional Progressive | Opus 160 kbps (note: not supported in iOS Safari) |
| Downloads | FLAC level 8; MP3 320; optional WAV/AIFF |
| Waveform JSON | Multi-resolution min/max |
| Analysis JSON | LUFS/TP, duration, optional tempo/key |
| Artwork | JPG/WEBP multiple sizes |
Cloudflare R2 Layout Example¶
r2-raw/
releases/<release_id>/masters/<track_id>.wav
r2-proc/
tracks/<track_id>/
hls/master.m3u8
hls/128k/seg-00001.ts
hls/192k/seg-00001.ts
waveform.json
analysis.json
cover_600.jpg
downloads/<release_id>/
FLAC/<track_num> - <title>.flac
MP3_320/<track_num> - <title>.mp3
Message Schemas¶
media.uploaded Event Example¶
{
"v": 1,
"event": "media.uploaded",
"trace_id": "abc123def456",
"idempotency_key": "upload-1234567890",
"requested_outputs": ["hls", "waveform", "analysis", "downloads"],
"data": {
"media_id": "1234567890",
"user_id": "user_42",
"file_path": "rawdigs-audio/raw/1234567890.wav",
"timestamp": "2024-06-01T12:00:00Z"
}
}
media.processed Event Example¶
{
"v": 1,
"event": "media.processed",
"trace_id": "abc123def456",
"idempotency_key": "process-1234567890",
"requested_outputs": ["hls", "waveform", "analysis", "downloads"],
"data": {
"media_id": "1234567890",
"formats": [
"mp3_320",
"aac",
"opus"
],
"artifacts": {
"hls": "rawdigs-audio/processed/1234567890/hls/master.m3u8",
"waveform": "rawdigs-audio/processed/1234567890/waveform.json",
"analysis": "rawdigs-audio/processed/1234567890/analysis.json",
"downloads": {
"mp3_320": "rawdigs-audio/processed/1234567890/mp3_320.mp3",
"flac": "rawdigs-audio/processed/1234567890/flac.flac"
}
},
"timestamp": "2024-06-01T12:30:00Z"
}
}
Operations Checklist¶
- Idempotency by
track_id/idempotency_key. - RabbitMQ: quorum queues, DLX, consumer prefetch 1–4.
- Concurrency sizing: CPU-bound FFmpeg workers vs analysis workers.
- Security: short‑TTL signed URLs; tokenized paths; CORS for upload origin.
- Caching: long TTL for HLS segments/waveform, shorter for manifests.
Development Setup¶
Below is a simplified docker-compose.yml snippet for local development:
version: '3.8'
services:
rabbitmq:
image: rabbitmq:3-management
ports:
- "5672:5672"
- "15672:15672"
audio-service:
build: .
environment:
- RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/
- R2_ENDPOINT=https://<account_id>.r2.cloudflarestorage.com
- R2_ACCESS_KEY=<your-access-key>
- R2_SECRET_KEY=<your-secret-key>
depends_on:
- rabbitmq
volumes:
- ./audio_files:/app/audio_files
ports:
- "8000:8000"
Example Celery Tasks¶
NOTE: ffmpeg may be called via subprocess for finer control; ffmpeg-python is optional.¶
from celery import Celery import ffmpeg import librosa import pyloudnorm as pyln import boto3 import json
app = Celery('audio_tasks', broker='amqp://guest@rabbitmq//')
s3_client = boto3.client('s3', endpoint_url='https://
@app.task def transcode_audio(media_id, input_key): # Download raw audio from R2 s3_client.download_file('rawdigs-audio', input_key, f'/tmp/{media_id}.wav')
# Transcode to mp3 320kbps
ffmpeg.input(f'/tmp/{media_id}.wav').output(f'/tmp/{media_id}_320.mp3', audio_bitrate='320k').run()
# Upload transcoded file
s3_client.upload_file(f'/tmp/{media_id}_320.mp3', 'rawdigs-audio', f'processed/{media_id}/mp3_320.mp3')
@app.task def analyze_audio(media_id, input_key): y, sr = librosa.load(f'/tmp/{media_id}.wav', sr=None)
# Loudness normalization
meter = pyln.Meter(sr)
loudness = meter.integrated_loudness(y)
# Tempo estimation
tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
analysis = {
"loudness": loudness,
"tempo": tempo,
"sample_rate": sr
}
with open(f'/tmp/{media_id}_analysis.json', 'w') as f:
json.dump(analysis, f)
s3_client.upload_file(f'/tmp/{media_id}_analysis.json', 'rawdigs-audio', f'processed/{media_id}/analysis.json')
```
Waveform JSON Example¶
json
{
"duration": 180.5,
"samples": [0, 0.02, 0.1, 0.15, 0.12, 0.05, 0, -0.03, -0.1, -0.12, -0.1, 0, ...]
}
Analysis JSON Example¶
{
"loudness": -14.3,
"tempo": 120.5,
"key": "C Major",
"spectral_centroid_mean": 3500.5,
"spectral_bandwidth_mean": 1500.2
}
Essentia Explanation¶
Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval. It provides a comprehensive set of algorithms for audio feature extraction, classification, segmentation, and more. The library offers Python bindings, enabling easy integration into Python projects like the Rawdigs Audio Service. Essentia is used here to extract detailed audio features that go beyond basic statistics, enriching the analysis JSON with musicological insights.
Install via pip install essentia where wheels are available, or base your Docker image on the official Essentia image for reliability.
Future Roadmap¶
- Add support for more audio formats and streaming protocols (e.g., HLS, DASH)
- Implement real-time audio processing pipelines
- Enhance waveform data with multi-resolution and zoomable formats
- Integrate machine learning models for genre and mood classification
- Improve scalability with Kubernetes and autoscaling workers
- Add user-configurable processing profiles
- Expand metadata extraction with automatic tagging
License and Related Repositories¶
- This project is licensed under the MIT License.
- Related repositories:
- rawdigs-laravel - Main Laravel backend
- rawdigs-frontend - React frontend
- rawdigs-audio-service - This audio processing microservice
For detailed contribution guidelines, see the CONTRIBUTING.md in each repository.