Paper: “Mastering the game of Go with deep neural networks and tree search”

Most recently, a paper “Mastering the game of Go with deep neural networks and tree search” was published in Nature. It describes a neural network architecture which for the first time won against a 2-dan professional Go player (Fan Hui) without handicaps achieving the title of the strongest Go engine existing. Between 5th and 15th of March 2016 a match between the engine – namely AlphaGo – and Lee Se-dol is scheduled. AlphaGo is pushed by Google.

It was difficult to me to follow the formal details lacking experience in machine learning. Here I would just like to cite the most interesting quotes:

[…] These games may be solved by recursively computing the optimal value function in a search tree containing approximately b^d possible sequences of moves, where b is the game’s breadth (number of legal moves per position) and d is its depth (game length). In large games, such as chess (b ≈ 35, d ≈ 80) and especially Go (b ≈ 250, d ≈ 150), exhaustive search is infeasible, but the effective search space can be reduced by two general principles. First, […] position evaluation […] Second […] sampling actions from a policy p(a|s) that is a probability distribution over possible moves a in position s.

[…] evaluating positions using a value network, and sampling actions using a policy network.

We trained a 13-layer policy network […] from 30 million positions from the KGS Go Server.

We also tested against the strongest open-source Go program, Pachi.

The final version of AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs. We also implemented a distributed version of AlphaGo that exploited multiple machines, 40 search threads, 1202 CPUs and 176 GPUs. […] The distributed version of AlphaGo was significantly stronger, winning 75% of games against single-machine AlphaGo and 100% of its games against other programs.

A policy p(a|s) is a probability distribution over legal actions a ∈ (s). A value function is the expected outcome if all actions for both players are selected according to policy p, that is, v^P(s) = [z_t|s_t = s, a_t…T ~ p].

Our rollout policy p_Π(a|s) contains less handcrafted knowledge than state-of-the-art Go programs.

Time controls for formal games were 1h main time plus three periods of 30s byoyomi.

The neural network architecture itself is described in the Methods Appendix. The (formal and informal) game results are provided as Data sheet. And in Methods/Evaluation a closing parenthesis is missing in the denominator of the logistic function

Paper: “Mastering the game of Go with deep neural networks and tree search”

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Dammapeta Mandal Sarpanch Upa-Sarpanch Mobile Numbers List Khammam District...

CAMDEN CAMPERS SALE IS ON NOW THIS CRACKING VW AUTOHOMES KOMET HAS BEEN...

Adidas World Cup 2006 TTF Font

Samini – Sweet Girls (Feat. Nameless)

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Outlook.com issue with window 8

Maryland: State Police report 416 DWI / DUI drivers during December 2014;...

SMI SM320AC MPTool

Final Purple Gang-Related Indictment Ensnared ‘Candy’ Davidson In Drug Bust...

charmilles roboform E998

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Rakaasi Rakaasi FULL Lyrics and translation | Rabhasa (2014)

Font Brazil World Cup 2004 kits

The Who – Who’s Next (1971/2023) [High Fidelity Pure Audio Blu-Ray Disc]

Presence detection with LD2410 and BH1750 - i2c doesnt work

Missing man located Bayview Avenue and Wilket Road area, Alexander Klopot, 31

Why do I get 'Access is Denied' when using Set-Service with Admin privileges?

Lady Gaga – MAYHEM (Bonus Tracks Version) [iTunes Rip M4A]

Java error when using Sky Go app