跳到主要內容

A3C in ATARI Pong-V0



ATARI PONG 對戰模式,左邊為遊戲程式,右邊為訓練中的A3C模型。一局以21分決勝負,對手MISS 一球得一分。從LOG可以看出,A3C模型從最初全敗的輸21分,經過2小時左右的TRAINING,已經逆轉至幾乎每局都勝利,偶爾甚至勝出高達13分。


底下為TRAINING A3C MODEL過程的LOG,

(base) frank@viper1:~/a3c$ python main.py --env-name "Pong-v0" --num-processes 8

Time 00h 00m 10s, episode reward -21.0, episode length 1026
Time 00h 01m 18s, episode reward -21.0, episode length 1020
Time 00h 02m 26s, episode reward -21.0, episode length 1029
Time 00h 03m 35s, episode reward -21.0, episode length 1023
Time 00h 04m 42s, episode reward -21.0, episode length 1014
Time 00h 05m 50s, episode reward -21.0, episode length 1087
Time 00h 07m 00s, episode reward -21.0, episode length 1359
Time 00h 08m 14s, episode reward -16.0, episode length 1922
Time 00h 09m 30s, episode reward -14.0, episode length 2220
Time 00h 10m 48s, episode reward -15.0, episode length 2431
Time 00h 12m 04s, episode reward -15.0, episode length 2211
Time 00h 13m 23s, episode reward -8.0, episode length 2600
Time 00h 14m 38s, episode reward -15.0, episode length 2138
Time 00h 15m 55s, episode reward -14.0, episode length 2266
Time 00h 17m 15s, episode reward -14.0, episode length 2373
Time 00h 18m 36s, episode reward -11.0, episode length 2986
Time 00h 20m 04s, episode reward -13.0, episode length 3985
Time 00h 21m 25s, episode reward -16.0, episode length 2866
Time 00h 22m 42s, episode reward -14.0, episode length 2336
Time 00h 24m 04s, episode reward -13.0, episode length 2941
Time 00h 25m 36s, episode reward -10.0, episode length 4295
Time 00h 27m 00s, episode reward -9.0, episode length 3529
Time 00h 28m 27s, episode reward -6.0, episode length 3395
Time 00h 30m 06s, episode reward -9.0, episode length 5174
Time 00h 31m 30s, episode reward -16.0, episode length 3308
Time 00h 33m 01s, episode reward -12.0, episode length 4429
Time 00h 34m 36s, episode reward -5.0, episode length 4915
Time 00h 36m 12s, episode reward 7.0, episode length 5206
Time 00h 37m 51s, episode reward -5.0, episode length 5437
Time 00h 39m 20s, episode reward -11.0, episode length 4120

Time 00h 41m 11s, episode reward -6.0, episode length 5296
Time 00h 42m 57s, episode reward -4.0, episode length 5941
Time 00h 44m 36s, episode reward -10.0, episode length 5001
Time 00h 46m 20s, episode reward -8.0, episode length 6140
Time 00h 48m 26s, episode reward -2.0, episode length 8778
Time 00h 50m 29s, episode reward 1.0, episode length 8327
Time 00h 52m 02s, episode reward -9.0, episode length 4356
Time 00h 53m 52s, episode reward -9.0, episode length 6798
Time 00h 55m 23s, episode reward -7.0, episode length 4086
Time 00h 57m 36s, episode reward 2.0, episode length 9581
Time 00h 59m 40s, episode reward 1.0, episode length 8471
Time 01h 01m 38s, episode reward 1.0, episode length 7759
Time 01h 03m 23s, episode reward -8.0, episode length 6118
Time 01h 05m 19s, episode reward -7.0, episode length 7803
Time 01h 07m 11s, episode reward 7.0, episode length 7340
Time 01h 09m 20s, episode reward 0.0, episode length 10000
Time 01h 11m 26s, episode reward -3.0, episode length 9585
Time 01h 13m 17s, episode reward -1.0, episode length 7136
Time 01h 15m 16s, episode reward -6.0, episode length 7788
Time 01h 17m 08s, episode reward -4.0, episode length 6754
Time 01h 19m 03s, episode reward 8.0, episode length 7267
Time 01h 21m 02s, episode reward 2.0, episode length 8403
Time 01h 23m 08s, episode reward 2.0, episode length 8935
Time 01h 25m 15s, episode reward 1.0, episode length 8634
Time 01h 27m 02s, episode reward 7.0, episode length 6527
Time 01h 28m 53s, episode reward -7.0, episode length 7202
Time 01h 31m 08s, episode reward 0.0, episode length 10000
Time 01h 33m 07s, episode reward 3.0, episode length 8596
Time 01h 35m 05s, episode reward 4.0, episode length 7680
Time 01h 36m 55s, episode reward 1.0, episode length 6875

Time 01h 39m 08s, episode reward 2.0, episode length 7490
Time 01h 40m 50s, episode reward 9.0, episode length 5853
Time 01h 42m 53s, episode reward 4.0, episode length 8869
Time 01h 44m 41s, episode reward 7.0, episode length 6371
Time 01h 46m 31s, episode reward -3.0, episode length 6642
Time 01h 48m 23s, episode reward 9.0, episode length 7192
Time 01h 50m 09s, episode reward 2.0, episode length 6351
Time 01h 51m 54s, episode reward 11.0, episode length 6137
Time 01h 53m 36s, episode reward 5.0, episode length 5978
Time 01h 55m 28s, episode reward 3.0, episode length 7244
Time 01h 57m 15s, episode reward 8.0, episode length 6572
Time 01h 58m 59s, episode reward 8.0, episode length 5973
Time 02h 00m 52s, episode reward 4.0, episode length 6814
Time 02h 02m 47s, episode reward 8.0, episode length 7276
Time 02h 04m 29s, episode reward 5.0, episode length 5431
Time 02h 06m 15s, episode reward 13.0, episode length 5686
Time 02h 08m 01s, episode reward 6.0, episode length 6625
Time 02h 09m 57s, episode reward 5.0, episode length 7139
Time 02h 12m 05s, episode reward 4.0, episode length 8886
Time 02h 13m 49s, episode reward 12.0, episode length 5931
Time 02h 15m 33s, episode reward 9.0, episode length 5734
Time 02h 17m 17s, episode reward 7.0, episode length 6030


ref:
https://github.com/nailo2c/a3c

留言

這個網誌中的熱門文章

DeepRacer

Preliminary training: deepracer-github-simapp.tar.gz Reward function: ./opt/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_env.py action = [steering_angle, throttle] TRAINING_IMAGE_SIZE = (160, 120) Plotted waypoints in vertices array of hard track Parameters: on_track, x, y, distance_from_center, car_orientation, progress, steps,                                                                          throttle, steering, track_width, waypoints, closest_waypoints Note: Above picture is from https://yanpanlau.github.io/2016/10/11/Torcs-Keras.html