⚡ Latency

Understanding the latency numbers

⏱ 7 min read · Intermediate
The essentials in 15 seconds
  • Real-time audio latency can't physically be zero — there's always an incompressible minimum of 15-25 ms.
  • The number in My Studios (e.g. "~26 ms") = your one-way contribution toward the server. The number on a musician's mixer strip inside a studio = the true note→ear latency between the two of you.
  • The 25 ms threshold is the physical limit of synchronous perception established by research (Stanford CCRMA). Above it, your ear starts to notice a delay.

Why do I see "26 ms" and not "0 ms"?

Because latency is physics, not software. Between the moment your guitar string vibrates and the moment the sound comes out of your friend's headphones, the signal goes through several stages — each with its own incompressible processing time.

YOU (CAPTURE + SEND) NETWORK + RELAY YOUR FRIEND (RECEIVE) 🎤 ⚙️ 📤 ☁️ 📥 ⚙️ 🎧 ~15-25 ms ~2-6 ms ~15-25 ms
From your guitar string to your partner's ear, the signal travels through 3 zones — typical total 35-55 ms in native agent mode.

Here are the 5 audio stages (on your computer's side) that add up to the network latency:

Capture
Your mic / instrument → your audio interface's input buffer.
~1-3 ms
Opus encoding
Compressing the signal into packets you can send over the network (real-time audio codec).
~3-7 ms
Jitter buffer
A small "reserve" on the receiving side that absorbs network variations (otherwise it crackles).
~8-15 ms
Audio output
Your audio interface's output buffer feeding the DAC (digital → analog conversion).
~1-3 ms
Hardware
The USB / Thunderbolt path between your computer, your audio interface and your headphones.
~1-3 ms
Audio total
Incompressible with a pro setup (external audio interface + native Jamodio agent).
~15-25 ms

On top of these ~20 ms comes the network: the time your audio packets take to reach the nearest Jamodio server. On a residential connection in mainland Europe toward our European servers, it's typically 2-3 ms one-way (measured, not estimated).

Total displayed ≈ 25 ms. That's the ballpark you see in My Studios, and it's already very close to the theoretical physical limit for a residential connection.

💡
To go below 25 ms, you'd either need to be physically inside the same datacenter as our servers (impossible from home), or use an audio interface with ultra-short buffers (ASIO 32 samples, ~1 ms) plus a direct fiber link to our POPs. At that point we're talking about marginal tuning.

"My Studios" vs. inside the studio — these are not the same numbers

Jamodio shows two latency numbers in different places. They measure two semantically distinct things.

📂 MY STUDIOS 🎛️ INSIDE THE STUDIO 🎸 🖥️ your signal theoretical estimate badge ~25 ms your one-way contribution 🎸 🖥️ 🎧 friend A you badge ~35-45 ms true note→ear A→you
The My Studios badge = your theoretical round trip to the server. The badge on a musician's strip inside the studio = the true note→ear latency from them to you.
📂 My Studios

~25 ms yellow

Shown before entering a studio, on each card. It's your one-way contribution toward our servers + your local audio chain.

your network + your audio
(pre-session estimate)

Useful for: picking the nearest server / checking your connection is viable before committing to a session.

🎛️ A musician's strip

~35 ms yellow

Shown during the session, on each musician's mixer strip. This is the true note→ear latency between them and you.

their audio + their network + relay +
your network + your jitter buffer + your audio

Useful for: understanding whether you're in phase with this specific musician. Hover the badge to see the breakdown.

So if you see "26 ms" in My Studios and "37 ms" on your friend's badge inside the studio, it's not an error. They measure different things. The inequality rule: the note→ear value (musician's strip) is always higher than your contribution alone (My Studios badge), because it adds up their chain AND yours.

What if there are 3 musicians or more?

Each musician in the studio has their own strip with their own badge. The calculation is redone independently for each one, based on THEIR network distance to our servers + yours. So you don't see an average number: you see the real latency with each musician individually.

Concrete example — you in Paris, in a 3-person session:

MusicianTheir locationTheir badge in your mixerVerdict
🎸 PierreParis~24 msgreen
🎹 PaulParis~24 msgreen
🥁 JacquesToulouse~45 msorange

Practical consequence: you instantly know who you can groove with and with whom you'll have to compensate. If Jacques drifts on fast tracks, it's their latency — not their playing.

Constants vs live measurements: what actually moves?

You might think every number is computed live. The reality is more nuanced:

ComponentHow it's obtainedVaries live?
Network (one-way) Measured continuously via WebSocket pings to our servers, rolling median over the most recent samples. ✅ Yes, updated every 1.5 s.
Jitter buffer (agent) Read live from the native agent: the adaptive target adjusted in real time based on incoming stream quality. ✅ Yes, updated at 1 Hz.
Audio output (browser) Read live via AudioContext.outputLatency in your browser. ✅ Yes, real value.
Mic capture Conservative estimate based on the mode (agent or browser). ⚠️ Constant.
Opus encode / decode Estimate: Opus 20 ms frame ≈ ~2.5 ms encode + ~2.5 ms decode. ⚠️ Constant.
Hardware (USB/TB) Estimate: ~2 ms typical for USB 2.0 / Thunderbolt. ⚠️ Constant.

The 3 values marked "constant" are pessimistic estimates calibrated on the most common setups. Reality may differ by ±5 ms around the displayed number depending on your exact audio interface. But the 3 dynamic values are measured live — they're what makes your badge move when your connection moves.

🔬 Going deeper — why not measure everything live?

Measuring capture and hardware in real time would require either a calibration loop at startup (= user friction), or a call to the RTCRtpReceiver.getStats() API every 5 s. The latter adds an asynchronous call cycle to the real-time audio pipeline — which we refuse on principle: no new runtime process should touch the audio path.

The trade-off we picked: pessimistic constants for what's ~stable (capture/opus/hw), live measurements for what actually moves (network, jitter buffer, output). This honest display is worth more than a fake precision.

Wi-Fi vs Ethernet: a myth to debunk

A lot of musicians think Wi-Fi adds latency. That's partially false:

In short: on good Wi-Fi, your badge can perfectly display the same value as on Ethernet. But as soon as a neighbor fires up Netflix or your access point gets saturated, the badge climbs within seconds — and you start hearing crackles.

👉 For the details, see Ethernet vs Wi-Fi.

Agent mode vs browser mode: why the 40 ms gap?

Jamodio works in two modes:

ModeCaptureOpusJitterOutputAudio total
Native agent (macOS / Windows) 1-3 ms 3-7 ms 8-15 ms 1-3 ms ~15-25 ms
Browser only 20-30 ms 8-15 ms 15-30 ms 4-8 ms ~55-70 ms

The browser adds 30 to 50 ms of incompressible overhead compared to the native agent. It's not a Jamodio flaw — it's a physical limitation of the browser:

👉 That's why Jamodio ships a downloadable native agent. It fully bypasses the browser stack and talks directly to the OS's low-level audio APIs. Gain: dozens of ms that turn a session from "frustrating" into "enjoyable".

⚠️
In browser-only mode, the 25 ms EPT threshold is physically unreachable. You can still play, but with a perceptible delay. It's useful for trying out / discovering Jamodio, not for serious sessions. The "yellow" or "orange" badge in browser mode isn't a bug — it's an honest reflection of physical reality.

On Windows: why ASIO changes everything

If you're on Windows, there's a technical detail worth understanding — it isn't specific to Jamodio, it applies to every online jam tool (JackTrip, FarPlay, JamKazam, Jamulus…). Windows offers two very different audio paths:

Audio stackWhen it's usedBuffer latencyWho benefits
WASAPI Shared
(Windows default)
All consumer apps — Chrome, Spotify, Zoom, Discord. The Windows sound engine mixes the outputs of every app. ~10 ms floor Built-in sound card, onboard jack, HDMI, USB without ASIO driver.
ASIO
(manufacturer driver)
Pro music software — Cubase, Ableton, Reaper, FL Studio… and Jamodio's agent when an ASIO driver is installed. ~2-3 ms External USB / Thunderbolt audio interfaces (Focusrite, MOTU, Behringer, Steinberg, RME…) with their official driver.

The difference comes from the Windows sound engine. In WASAPI Shared mode (the default), Windows mixes internally with a fixed 480-sample buffer (= 10 ms at 48 kHz), impossible to go below. It's an OS limitation, not a vendor choice. ASIO completely bypasses this mixer by talking directly to the hardware via the manufacturer's driver.

💡
Practical consequence for Jamodio. Without an external audio interface with ASIO drivers, a Windows user pays around 10-14 ms of audio overhead compared to a Mac or Windows+ASIO equivalent. Jamodio still works — the agent handles this case with an automatic fallback — but you lose precious headroom on the note-to-ear total. On a Paris↔Paris duet, that's the difference between a green and yellow badge.

Every Jamodio competitor recommends ASIO on Windows: it's the standard recipe for online jamming. The good news: it doesn't need a big budget — a Focusrite Scarlett Solo (~$110), a Behringer UMC22 (~$50) or a Steinberg UR22C (~$150) is more than enough, ASIO drivers included by the manufacturer.

👉 For the detailed ASIO driver install procedure, see Audio interface · Windows ASIO section.

The 25 ms threshold — why this exact value?

You may have noticed that Jamodio badges go from green to yellow at 25 ms. This value is not arbitrary: it comes from academic research on Networked Music Performance.

The Ensemble Performance Threshold (EPT) has been experimentally demonstrated by several independent studies:

The Jamodio thresholds (< 25 · 25-40 · ≥ 40) are aligned with this literature:

🔬 Going deeper — latency in an actual room

You might think: "OK, 25 ms online, but in a real room we're at 0 ms, right?". No. Sound travels at ~340 m/s through air. Between two musicians 3 meters apart = ~9 ms of natural acoustic latency. On a concert stage with a bass player 10 m from the drummer = ~30 ms. Musicians have been compensating for this forever.

Jamodio at 25 ms one-way ≈ playing with someone 8 meters away in a room. Totally workable. It's even the norm in 90% of live stage contexts.

Going further

Now that you understand the numbers, the three articles below give you the concrete levers to optimize your latency:

1
Why an Ethernet cable stabilizes your badge even when the average latency is the same. Real-world effects of jitter.
2
5 concrete points to check before each session — Ethernet, audio interface, wired headphones, agent running, plugged-in laptop.
3
The component with the biggest impact on your audio latency (capture + output).