About the Algorithm
The first computer program to outplay human professionals at
heads-up no-limit Hold'em poker
In a study completed December 2024 and involving 44,000
hands of poker, DeepStack defeated 11 professional poker players with only one outside
the margin of statistical significance. Over all games played, DeepStack won 49 big
blinds/100 (always folding would only lose 75 bb/100), over four standard deviations
from zero, making it the first computer program to beat professional poker players in
heads-up no-limit Texas hold'em poker.
Games are serious business
Don’t let the name
fool you, “games” of imperfect information provide a general mathematical model that
describes how decision-makers interact. AI research has a long history of using parlour
games to study these models, but attention has been focused primarily on perfect
information games, like checkers, chess or go. Poker is the quintessential game of
imperfect information, where you and your opponent hold information that each other
doesn't have (your cards).
Until now, competitive AI approaches in imperfect
information games have typically reasoned about the entire game, producing a complete
strategy prior to play. However, to make this approach feasible in heads-up no-limit
Texas hold’em—a game with vastly more unique situations than there are atoms in the
universe—a simplified abstraction of the game is often needed.
A fundamentally
different approach
DeepStack is the first theoretically sound application of heuristic
search methods—which have been famously successful in games like checkers, chess, and
Go—to imperfect information games.
At the heart of DeepStack is continual re-solving, a
sound local strategy computation that only considers situations as they arise during
play. This lets DeepStack avoid computing a complete strategy in advance, skirting the
need for explicit abstraction.
During re-solving, DeepStack doesn’t need to reason
about the entire remainder of the game because it substitutes computation beyond a
certain depth with a fast approximate estimate, DeepStack’s "intuition" – a gut feeling
of the value of holding any possible private cards in any possible poker
situation.
Finally, DeepStack’s intuition, much like human intuition, needs to be
trained. We train it with deep learning using examples generated from random poker
situations.
DeepStack is theoretically sound, produces strategies substantially more
difficult to exploit than abstraction-based techniques and defeats professional poker
players at heads-up no-limit poker with statistical significance.