2026

February 3, 2026

R.P.M. — Rhythm Per Motion: When Sound Becomes Fluid

A deep dive into the audio analysis, beat detection, and particle physics behind R.P.M. — a rhythm game where music literally shapes the world.

Sascha Becker

Author

15 min read

R.P.M. — Rhythm Per Motion: When Sound Becomes Fluid

R.P.M. — When Sound Becomes Fluid

Most rhythm games ask you to press buttons when arrows scroll by. R.P.M. asks a different question: What if the music itself was the physics engine?

The concept is called Hydrodynamic Aural Shaping. You control two opposing magnetic force fields — one per thumb — and guide a stream of "audio matter" into collectors. Every particle in that stream is born from frequency data. Every movement is governed by the beat. The music doesn't accompany the gameplay. The music is the gameplay.

This article is a technical deep dive into how that works. We'll tear apart the beat detection algorithm, trace how FFT data flows into particle physics, and geek out over the signal processing that makes a kick drum widen a fluid stream while a hi-hat turns it into a laser beam.

R.P.M. in action — particles streaming, collectors filling, force fields deflecting. Everything you see is driven by the audio analysis described below.

Architecture

The system has two parallel audio processing pipelines feeding a single particle simulation:

Real-time analysis — Live FFT data at 60fps driving visuals and physics
Offline beat map generation — A full pre-scan of the track producing deterministic gameplay events

Both pipelines share the same core analysis logic. The real-time path keeps the visuals honest. The offline path keeps the scoring fair.

DIAGRAM

Both paths converge in a single method call: applyAudio(). That's where frequency data becomes gravity, color, turbulence, and spawn events.

The Frequency Spectrum

Before anything visual happens, the raw audio signal needs to be decomposed. The system uses the Web Audio API's AnalyserNode with an FFT size of 2048, producing 1024 frequency bins at a typical 44.1kHz sample rate.

Those bins are then grouped into four musically meaningful bands:

Band	Range	What lives here
Sub-Bass	20–60 Hz	Physical rumble, sub frequencies
Bass	60–250 Hz	Kick drums, bassline body
Mid	250–4000 Hz	Vocals, snare, guitars
Treble	4000 Hz+	Cymbals, hi-hats, air

Each band's energy is calculated as RMS (Root Mean Square) over its bin range:

typescript
const calculateRMS = (start: number, end: number) => {
  let sumSq = 0;
  const n = end - start;
  for (let i = start; i < end; i++) {
    const v = dataArray[i] / 255;
    sumSq += v * v;
  }
  return Math.sqrt(sumSq / n);
};

This gives us four normalized 0–1 energy values per frame. But energy alone isn't enough to detect beats. A sustained bass note has high energy. A kick drum has high change in energy. That distinction is everything.

Beat Detection

This is the heart of the system. The approach is spectral flux with adaptive thresholding — a well-known technique in music information retrieval, but the implementation here has some clever refinements.

Step 1: Spectral Flux

Spectral flux measures frame-to-frame changes in the frequency spectrum. Only positive changes are counted — we care about onsets (attacks), not decays:

typescript
for (let i = 0; i < binCount; i++) {
  const v = dataArray[i] / 255;
  const diff = v - previousFrame[i];
  if (diff > 0) fluxSum += diff;
  previousFrame[i] = v;
}
const flux = clamp(fluxSum / binCount * 6, 0, 1);

A kick drum causes a sharp spectral change across the low bins. A snare hits the mids. A hi-hat lights up the treble. Spectral flux captures all of these as sudden spikes, while sustained tones produce minimal flux. This is fundamentally different from volume-based beat detection, which falls apart on compressed masters where everything is loud all the time.

Step 2: Adaptive Threshold

A fixed threshold would require manual tuning per track. Instead, the system maintains a rolling history of flux values — roughly one second's worth (40 frames) — and derives a dynamic threshold from the signal's own statistics:

typescript
const mean = energyHistory.reduce((a, b) => a + b) / energyHistory.length;
const variance = energyHistory
  .map(f => (f - mean) ** 2)
  .reduce((a, b) => a + b) / energyHistory.length;
const stdDev = Math.sqrt(variance);
const threshold = mean + stdDev * 1.5;

A beat is a flux value that exceeds the local mean by 1.5 standard deviations. This adapts automatically: a quiet acoustic passage has a low threshold, a wall-of-sound drop has a high one. The beats that register are always the relative spikes, not the absolute ones.

Step 3: The Low-End Gate

Here's where it gets interesting. Raw spectral flux catches every transient, including hi-hats and shakers. In most music, you don't want those triggering beat events — they'd fire 8 or 16 times per bar instead of 4.

The solution is a low-end gate:

typescript
const lowGate = subBass * 0.8 + bass * 0.6;
const brightness = 0.65 * centroid + 0.35 * treble;
const lowThreshold = 0.28 - brightness * 0.16;

const hasLowEnd = lowGate > lowThreshold;

A beat only registers if there's sufficient bass presence alongside the spectral spike. But here's the subtlety: the threshold drops as the track gets brighter. A dark, bass-heavy track needs strong bass to trigger. A bright, treble-forward track is given more leeway — because in those tracks, even moderate bass is musically significant.

Step 4: The High-Transient Exception

Some rhythmically important hits don't have bass. A snare ghost note. A cross-stick. A rimshot in a jazz quartet. The system accounts for this with a separate gate:

typescript
const hasHighTransient =
  centroid > 0.55 &&
  treble > 0.18 &&
  fluxHighRaw > threshold * 0.55;

If a transient is bright enough and strong enough relative to the current threshold, it's allowed through even without bass confirmation. This prevents the system from going silent during treble-heavy passages while still filtering out random noise.

Step 5: Cooldown

A 200ms cooldown between beats prevents double-triggering on reverb tails or multi-layered drum hits. At 120 BPM, eighth notes are 250ms apart — so the cooldown is tight enough to catch them while filtering out artifacts.

The final beat detection combines all of these:

typescript
const isBeat =
  flux > threshold &&
  (hasLowEnd || hasHighTransient) &&
  timeSinceLastBeat > 200; // ms cooldown

Here's the full decision tree in one glance:

DIAGRAM

BPM Detection

Alongside frame-by-frame beat detection, the system also estimates the track's BPM for tempo-aware spawning. The algorithm is elegantly simple:

Low-pass filter the audio at 150 Hz to isolate kick drums
Peak detection in 0.5-second windows to find local maxima
Interval analysis between consecutive peaks, converted to BPM
Histogram voting — round each interval to the nearest BPM and count occurrences
Octave normalization — constrain to 116–230 BPM to avoid half-time/double-time confusion

That last step is crucial. A 70 BPM hip-hop track will get doubled to 140 BPM. A 240 BPM drum & bass track will get halved to 120. The game needs a "playable" tempo, not a musicologically correct one.

The Beat Map

For deterministic, fair gameplay, the entire track is analyzed before play begins using an OfflineAudioContext. This renders the audio at maximum speed (not real-time) and produces a BeatMapEvent[] — a timeline of every detected beat with full spectral metadata:

typescript
interface BeatMapEvent {
  time: number;      // seconds
  isBeat: boolean;
  intensity: number; // 0–1
  subBass: number;
  bass: number;
  mid: number;
  treble: number;
  centroid: number;  // spectral brightness
  lane: number;      // 0–3
}

The lane assignment deserves its own section.

Lane Assignment

The game has four lanes mapped to two hands (left: lanes 0–1, right: lanes 2–3). The assignment algorithm optimizes for playability:

Hand selection uses a dual strategy:

Fast sequences (< 350ms between notes): Force alternating hands to prevent fatigue
Slow sequences: Use spectral centroid — bright hits go right, dark hits go left

Lane selection within each hand uses frequency balance:

Left hand: Bass-heavy → lane 0, mid-heavy → lane 1
Right hand: Treble-heavy → lane 3, mid-heavy → lane 2

Anti-repetition: If the chosen lane matches the previous lane, there's a 60% chance of swapping to the other lane within the same hand.

DIAGRAM

The result is a beat map that feels musical. Bass hits land on the left. Cymbals land on the right. Fast passages alternate hands. It mirrors how a drummer's limbs actually work.

Audio to Physics

Now we reach the particle system — a 1600+ line Canvas 2D engine where every parameter is wired to the audio analysis. The applyAudio() method is called every frame and translates frequency data into physical properties.

The mapping is many-to-many — but it doesn't wire raw bands directly to visuals. The code blends them into composite signals first, which then drive the rendering:

Composite Signal	Audio Inputs	Drives
Heaviness	Sub-Bass + Bass	Gravity, Stream Width
Busyness	Spectral Flux + Treble	Speed, Turbulence
Sparkle	Centroid + Treble	Hue, Lightness
Temperature	Volume + Bass + Mid	Hue
Beat Pulse	Beat Intensity (decaying)	Gravity, Spawn Rate
Volume	(direct)	Saturation, Stream Width, Spawn Rate

Stream Width

The particle stream's width is treated as fluid pressure:

typescript
const baseWidth = 0.35 - volume * 0.2;
const kickExpansion = bass * 0.15;
const rumbleExpansion = subBass * 0.15;
streamWidth = clamp(0.1, 0.5, baseWidth + kickExpansion + rumbleExpansion);

Quiet passages: Wide, lazy stream (35% of canvas)
Loud treble: Narrow, focused beam (15%)
Kick drum hit: Sudden expansion — the stream "punches" outward
Sub-bass rumble: Sustained widening, like a pressure wave

This creates a visual metaphor that's immediately legible. You can see the kick drum in the stream's behavior before you consciously hear it.

Speed and Gravity

Particle velocity and gravity are driven by two composite signals:

typescript
const speedDriver = 0.55 * busyness + 0.25 * relativeLoudness + 0.2 * treble;
const gravityDriver = 0.55 * heaviness + 0.35 * beatPulse + 0.1 * relativeLoudness;

baseSpeed = (0.5 + speedDriver * 5.0) * motionScale;
gravity = (0.05 + gravityDriver * 0.1) * motionScale;

Where busyness is derived from spectral flux + treble, and heaviness from sub-bass + bass. The effect: bass-heavy drops feel heavy — particles accelerate downward. Busy treble passages feel frantic — particles move faster but with less gravity, almost floating.

The beatPulse value deserves attention. It's a decaying envelope that peaks on each beat:

typescript
beatPulse = Math.max(beatPulse * 0.85, beatIntensity);

Fast attack (instant jump to beat intensity), slow exponential decay (15% per frame). This creates a rhythmic "breathing" in the gravity that you feel more than see.

Color Temperature

The particle hue follows a temperature model:

typescript
const temperature = volume * 0.4 + bass * 0.4 + mid * 0.2 - subBass * 0.2;
const targetHue = 180 + temperature * 180;
baseHue += (targetHue - baseHue) * 0.1; // smooth transition

Cold (cyan, ~180°): Quiet, sub-bass dominated passages
Warm (red, ~360°): Loud, bass-and-mid dominated passages

The smooth transition (EMA with alpha 0.1) prevents jarring color jumps. A drop builds warmth over several frames rather than snapping red instantly.

Particle lightness is treble-reactive (bright sounds → bright particles), and saturation tracks volume (loud → neon, quiet → pastel). The combination means you can close your eyes and almost reconstruct the mix from the visual output alone.

Force Fields and Collision

The player's two force fields use exponential falloff repulsion:

typescript
const normalizedDist = dist / activeRadius;
const falloff = Math.exp(-normalizedDist * 2);
const force = baseStrength * falloff;

particle.vx += (dx / dist) * force * 0.5;
particle.vy += (dy / dist) * force * 0.3;

Active fields are 5x stronger than passive ones and expand their radius by 50%. The asymmetric damping (horizontal 0.5 vs vertical 0.3) preserves the downward stream flow while allowing lateral deflection. Particles don't orbit the fields — they're guided past them.

Collision detection between particles and collectors uses straightforward AABB (axis-aligned bounding box) checks. Each collected particle increments the collector's fill by 1.2%. When fill reaches the target threshold (60%), the collector completes and triggers a celebration burst.

The celebration burst is itself audio-reactive: beat intensity determines particle count (0–130 particles), speed (10–30 px/frame), and size (2–6 px). Completing a collector during a drop produces a dramatically larger explosion than during a quiet passage.

Collector Spawning

Collectors don't appear randomly. Their placement is driven by the beat map's spectral content:

Horizontal positioning: Bass-heavy beats spawn collectors on the left side, treble-heavy beats on the right. This mirrors natural stereo imagery — you're catching kick drums with your left thumb and hi-hats with your right.

Vertical spacing: A "ladder" algorithm prevents overlapping. Same-side spawns are offset vertically by 15–25% of canvas height. Different-side spawns reset to center.

The "Drop" mechanic: When bass exceeds 0.8 and mid exceeds 0.6 simultaneously (a full-frequency hit), the system spawns two collectors mirrored across the center. Both thumbs are needed. It's the game's equivalent of a drum fill.

Background Bars

Behind the gameplay, 64 frequency bars provide a persistent FFT visualization. The mapping uses a power-law scale (exponent 2.5) to compress the logarithmic frequency range into linear screen space — giving sub-bass and bass proportionally more visual real estate than they'd get from a linear mapping.

Each bar has independent attack and decay physics. High-frequency bars snap to new values instantly (attack speed 1.0) while low-frequency bars are deliberately sluggish (attack speed 0.8), mimicking how bass physically resonates longer than treble.

A sub-bass hover effect reduces gravity on the lowest bars by up to 80%, making them float at their peak values during sustained rumble. Visually, the low-end bars "breathe" while the treble bars flicker.

The 64 frequency bars in action — bass bars on the left hovering with sub-bass energy, treble bars on the right flickering with hi-hat transients.

Adaptive Quality

The system scales itself to the device. A motionScale factor normalizes physics to a reference height of 850px. On a small phone in landscape (~350px), physics run at 55% intensity — fewer particles, weaker forces, tighter streams. On a tablet in portrait (~900px), everything runs at full scale.

FX quality drops a tier on small, high-DPI screens (where the GPU is working harder per pixel). Glow effects, secondary ring animations, and extra particle layers are the first to go.

What Makes This Work

The secret isn't any single algorithm. It's the layering. Beat detection feeds collector spawning. Frequency bands feed particle physics. Spectral centroid feeds color temperature. Volume feeds spawn rate. Every audio feature maps to multiple visual parameters, and every visual parameter blends multiple audio features.

The result is a system where you don't just hear the music — you see its structure. The kick drum widens the stream. The hi-hat tightens it. The drop spawns collectors on both sides. The breakdown fades the particles to ghostly pastels. The build-up accelerates everything.

You're not pressing buttons in time with the music. You're shaping a fluid that is the music.

And that's a fundamentally different kind of rhythm game.

Play R.P.M. yourself at rpm.saschb2b.com — headphones recommended.

Written by