Methodology

By Marcus, independent poker bot operator · [enable JS to see contact email]

The bench: one room (PPPoker), eight bot products tested under thirteen profile configurations, identical network environment per seat, ~22,000 dealt hands per setup-equivalent. Bench window: autumn 2025 through winter 2026 (last hands January–February 2026). Below is how the test was constructed and what I did not control for.

When the test ran, and why it ended

The bench started in autumn 2025 and the last hands were dealt in January–February 2026. The window did not close because I ran out of hands or budget — it closed because PPPoker shipped a major update to its Windows client, and the entire OpenHoldem-family field (Warbot, Shanky, Inhuman, PokerBot.com) could no longer attach to the updated client at all. Four of eight products were effectively dead on the bench in the same week. Continuing the test on broken products would have produced misleading numbers, so I stopped, locked the dataset, and wrote it up. Every price, ban event, build version, and winrate figure in this comparison is what those products looked like on that timeline.

Why PPPoker

Every product on the list either ships native PPPoker support or has a community profile for it. That's a low bar — most of these bots don't run on the regulated tier-one rooms like Pokerstars or GGPoker at all — but it made PPPoker the only room where the field would all start in the same place. For the OpenHoldem-family products (Inhuman, Warbot, Shanky, PokerBot.com profiles), PPPoker has well-maintained table maps. For the Android-emulator products (NZT Poker, PokerBotAI), PPPoker is among their primary supported rooms. For 3upgaming, PPPoker is the first room in their advertised list. And for Deepermind, I added PPPoker support manually because the engine's OCR layer was adaptable.

The benches

One physical host. Multiple Windows VMs for the OpenHoldem-family products (each product gets its own clean image). One Android emulator host configuration for the emulator products. All proxies routed through a single residential-grade IP pool — same vendor, same region distribution. FakeGPS-style location randomisation enabled on every Android instance, GPS coordinates correlated with each IP's location. Session schedules randomised across the day, no seat running more than four hours continuous.

Configuring each product

Stock configurations first. Every product was first tested in its out-of-the-box recommended profile. For products that ship multiple profiles (Shanky/BonusBots, Warbot), I tested at least two distinct profiles per product. Warbot got four (UltraGTO, Snowball, the default ship, and one community-published profile). PokerBot.com got two of its stocked OpenHoldem/Shanky reseller profiles. The vendor-side products (NZT, PokerBotAI, 3upgaming) were configured per the vendor's documentation, with all anti-detection options enabled.

Where vendors offered guidance on proxy / GPS / behavior, I followed it. The point of this test was not to handicap any product by skipping its own recommendations. If a vendor said "use mobile IPs," I used mobile IPs. If they said "limit to 4-hour sessions," I limited to 4-hour sessions. Anyone who tells you a bot's success is independent of operator discipline has not actually run one.

Stakes and the opponent pool

Why NL2–NL4. The bench ran at NL2 and NL4 cash on PPPoker, plus a thin slice of MTT play where the product advertised MTT support. The stake choice was deliberate: a bot's decision logic does not change between micro and mid-stakes — the strategy file, the table-map, the CFR weights, all of it computes the same action regardless of blind size. What does change with stake is the per-hand fuel/gas cost on the metered products (NZT Poker and PokerBotAI both price fuel as a function of stake and room). Running the bench at NL2–NL4 minimised fuel spend across thirteen configurations and ~22,000 hands per setup-equivalent without changing what any product was actually trying to do. This is also why the "realistic ceiling" notes for each product — NL50 for Warbot, micro-stakes for Shanky and so on — come from each vendor's own published guidance and forum reception, not from observed bench performance at higher stakes.

Who was at the table. The opponent mix is what showed up on those NL2–NL4 PPPoker clubs — a working pool of recreational players, some grinders, a known but unquantified bot presence from competing operators. Same opponent profile across all tested products, because the seats were rotated across the same tables on overlapping schedules. Several seats from different bot products were observed playing each other in the same game during the test window; that's part of why the comparison is meaningful — they all faced the same competition, including each other.

Hand counting

~22,000 dealt hands per setup-equivalent. "Setup-equivalent" means a fresh seat with a fresh account on a fresh proxy, run on one product+profile configuration, until either it reached the hand target or got banned, whichever came first. Banned setups are counted as their pre-ban hand count plus a noted ban event. The two products that finished positive (NZT Poker near zero, PokerBotAI positive) both completed the full hand target without being banned during the test window.

What I did not control for

Several things, honestly. The opponent pool drifted week-over-week — I cannot prove the same fish density at the start of the test as at the end. The order in which products were tested means later products faced a marginally more bot-saturated pool. The proxy provider had two outages during the window which I treated as missing data rather than excluding the affected seats. And I am one person with one bench — replicating this on a second operator's bench would produce slightly different numbers, possibly different rankings within the bottom half of the table.

What I'm confident in: the gap between the top two (NZT Poker, PokerBotAI) and the rest is large enough that bench noise does not explain it. PokerBotAI was meaningfully positive, NZT was meaningfully near-zero, everyone else was meaningfully in the negative.

What I did not test

The club-side services that several products advertise (filling seats at a club owner's tables, generating rake) were not in scope. That is a fundamentally different evaluation — the question there is whether the club owner makes money, not whether the bot does — and it requires cooperation with a club owner to run honestly. Both NZT and PokerBotAI advertise this; 3upgaming advertises it; PokerBot.com mentions it. None of those services are reflected in the comparison.

I also did not test on Pokerstars, GGPoker, or any other tier-one regulated room. The Android-emulator products do not support them, and the OpenHoldem-family products that nominally do get caught faster than the test window permits.

The result one more time

Read it however you want. ~22,000 hands of PPPoker, thirteen configurations across eight vendors, one careful bench. One product finished positive (PokerBotAI). One product broke even (NZT Poker). Five were negative. One was a scam (3upgaming). For the per-criterion scoring, see criteria. For the full per-product writeup, return to the comparison.