Kotlin Coroutines vs. Java Virtual Threads: The Ultimate Concurrency Showdown
Do we still need Coroutines with Project Loom? A high-performance stress test comparing Java Virtual Threads and Kotlin Coroutines under tight memory configurations.

Kotlin Coroutines vs. Java Virtual Threads: The Ultimate Concurrency Showdown (With Resource Constraints!)
Hey everyone, welcome back! Grab your coffee—or a hot cup of chai—because today we are diving into something really exciting.
If you work on the JVM, you already know the massive hype around Java 21 and Project Loom. Virtual Threads are finally here, promising to make high-concurrency applications super smooth without the heavy memory footprint of traditional OS threads.
But as Kotlin developers, we have a big question: Do we still need Coroutines? Or does Project Loom kill them? To find out, I decided to stop guessing and write a proper JMH (Java Microbenchmark Harness) benchmark.
And because we deploy to the real world (hello, tiny Docker containers!), we didn't just test scaling the tasks—we tested scaling the machine too. Let's look at the data!
The Setup: Creating the Battleground
We wrote a benchmark to simulate a classic I/O delay—like a database call or an API request taking exactly 10 milliseconds. We tested four different approaches:
- The Old Way: Java Platform Threads (capped at 1,000 to save our RAM).
- The New Way: Pure Java Virtual Threads (Project Loom).
- The Kotlin Way: Native Kotlin Coroutines using
delay(). - The Hybrid Way: Kotlin Coroutines running on top of a Loom Virtual Thread Dispatcher.
To make this super realistic, we tested a light load of 1,000 tasks and a heavy load of 50,000 tasks. Then, we ran this entire matrix on two different machine profiles:
- High-Res Profile: 4GB Heap RAM, 4 CPU Cores (Like a solid backend server)
- Low-Res Profile: 512MB Heap RAM, 1 CPU Core (Like a cheap cloud container)
Our Sweet Gradle Config
To automate this, we configured our build.gradle.kts to run separate processes for each profile and output the data to JSON files. Here is the magic snippet we used:
// JMH: profile via -PjmhProfile=low|high (default = default)
val jmhProfile = findProperty("jmhProfile")?.toString() ?: "default"
jmh {
fork = 5
warmupIterations = 3
iterations = 5
benchmarkMode = listOf("thrpt")
resultFormat = "JSON"
when (jmhProfile) {
"low" -> {
resultsFile.set(project.file("build/reports/jmh/results-low.json"))
jvmArgs.addAll(listOf("-Xmx512m", "-XX:ActiveProcessorCount=1"))
}
"high" -> {
resultsFile.set(project.file("build/reports/jmh/results-high.json"))
jvmArgs.addAll(listOf("-Xmx4g", "-XX:ActiveProcessorCount=4"))
}
else -> resultsFile.set(project.file("build/reports/jmh/results.json"))
}
}
// Custom tasks to run them sequentially!
tasks.register("jmhLowRes") { /* runs low profile */ }
tasks.register("jmhHighRes") { /* runs high profile */ }
tasks.register("jmhAll") {
dependsOn("jmhLowRes", "jmhHighRes")
}
Pro Tip: We dropped the generated JSON files into jmh.morethan.io to easily visualize the charts. Highly recommend checking that tool out!
The Results: Who Survived the Stress Test?
Here is the throughput (Operations per Second - higher is better). Let's see what happens when we push the system to its absolute limits.
High-Res Machine (4 CPU, 4GB RAM)
| Approach | 1k Tasks | 50k Tasks | Verdict |
|---|---|---|---|
| Pure Java Virtual Threads | 69.37 | 34.56 | Loom handles 50k beautifully. |
| Coroutines + Loom (Hybrid) | 75.28 | 33.31 | The Sweet Spot! Matches pure Java speed. |
| Kotlin Coroutines (Native) | 52.84 | 0.44 | Crashed. Bottlenecked by default IO pool. |
| Java Platform Threads | 39.24 | 1.43 | Crashed. OS threads are too heavy. |
Low-Res Machine (1 CPU, 512MB RAM)
| Approach | 1k Tasks | 50k Tasks | Verdict |
|---|---|---|---|
| Pure Java Virtual Threads | 70.92 | 24.48 | The Survival King. Still scaling on 1 CPU! |
| Coroutines + Loom (Hybrid) | 70.94 | 16.82 | Started choking. The "Kotlin wrapper" overhead shows up here. |
| Kotlin Coroutines (Native) | 53.61 | 0.48 | Flatlined. |
| Java Platform Threads | 37.71 | 1.41 | Flatlined. |
Decoding the Data: What's the Story Here?
1. The 50k Queue of Death
Look at Platform Threads and Native Coroutines at 50,000 tasks. They drop to around 1 op/s, regardless of whether they have 4 CPUs or 1 CPU! When 50k tasks try to share a limited thread pool, they form a massive queue. It's like trying to pass Bangalore traffic through a single lane—throwing more RAM at it doesn't fix the jam.
2. The Hybrid Approach Wins for Most Backends
Look at the High-Res (4 CPU) table. When you take Kotlin Coroutines and run them on a Loom-backed Dispatcher, you get ~33 ops/s at 50k tasks. This is neck-and-neck with Pure Java Virtual Threads (34 ops/s). You pay zero "performance tax", but you get to keep Kotlin's amazing Structured Concurrency and suspend syntax. Absolute magic!
3. Starvation Mode: Pure Loom Takes the Crown
This is the big revelation! Look at the Low-Res (1 CPU) table at 50k tasks. When we starved the JVM of resources, Pure Java Virtual Threads stayed strong at 24 ops/s, while the Hybrid Coroutines+Loom model dropped to 16 ops/s.
Why? Because Kotlin Coroutines require the JVM to create Job and Continuation objects. When you only have 512MB of RAM and 1 CPU, the Garbage Collector has to work overtime to clean up those extra objects, dragging the score down. Pure Loom has no such overhead!
The Final Verdict
Does Project Loom kill Kotlin Coroutines? Absolutely not.
If you are building a standard backend microservice with decent CPU/RAM, use the Hybrid Approach. Back your Kotlin Coroutines with a Virtual Thread executor to get the cleanest code and top-tier performance.
But, if you are running background jobs on edge devices, IoT hardware, or the cheapest, tiniest cloud containers possible... skip the Coroutines and use Pure Java Virtual Threads. When resources are tight, Loom's raw JVM integration is unbeatable.
If you want to run these benchmarks on your own machine, I've pushed the complete setup to my GitHub: Check out the source code here!
Keep coding, and see you in the next one!