跳到主要內容

DQN-Atari-Enduro porting


For
https://github.com/matrixBT/DQN-Atari-Enduro



Dockerfile:
FROM tensorflow/tensorflow:1.12.0-gpu-py3 

#https://github.com/yanpanlau/DDPG-Keras-Torcs
WORKDIR /home/frank/rl/DQN-Atari-Enduro
ADD . /home/DQN-Atari-Enduro

RUN apt-get update
RUN apt-get install -y vim xautomation torcs
RUN apt-get install -y libjpeg-dev cmake swig python-pyglet python3-opengl libboost-all-dev \
        libsdl2-2.0.0 libsdl2-dev libglu1-mesa libglu1-mesa-dev libgles2-mesa-dev \
        freeglut3 xvfb libav-tools

#optional gtags
RUN apt-get install -y exuberant-ctags libncurses-dev

RUN pip install jupyter scipy gym
RUN pip install "gym[atari]"
RUN pip install keras-rl
RUN pip install https://pypi.python.org/packages/68/c3/300c6f92b21886b0fe42c13f3a39a06c6cb90c9fbb1b71da85fe59091a7d/pyglet-1.2.4-py3-none-any.whl#md5=08e6404a678f91b4eee85eb33b028d88 
ENV PATH="/usr/games:${PATH}"

CMD ["/bin/bash"]

launch jupyter notebook within docker container:
1.
(~/rl/DQN-Atari-Enduro) viper1 $ docker run --runtime=nvidia -it -e DISPLAY=$DISPLAY -e XAUTHORITY=$XAUTHORITY -v /tmp/.X11-unix:/tmp/.X11-unix -v /home/frank/rl/DQN-Atari-Enduro:/home/DQN-Atari-Enduro -v /var/run/docker.sock:/var/run/docker.sock --workdir /home/DQN-Atari-Enduro -p 8888:8888 tensorflow/tensorflow:DQN-Atari-Enduro /bin/bash

2.
root@9da3183dad9e:/home/DQN-Atari-Enduro# jupyter notebook --ip 0.0.0.0 --no-browser --allow-root --port 8888

3.
open browser in host and paste
http://localhost:8888

4.
then input token from step2

留言

這個網誌中的熱門文章

DeepRacer

Preliminary training: deepracer-github-simapp.tar.gz Reward function: ./opt/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_env.py action = [steering_angle, throttle] TRAINING_IMAGE_SIZE = (160, 120) Plotted waypoints in vertices array of hard track Parameters: on_track, x, y, distance_from_center, car_orientation, progress, steps,                                                                          throttle, steering, track_width, waypoints, closest_waypoints Note: Above picture is from https://yanpanlau.github.io/2016/10/11/Torcs-Keras.html

增強式學習

   迴力球遊戲-ATARI     賽車遊戲DQN-ATARI 賽車遊戲-TORCS Ref:     李宏毅老師 YOUTUBE DRL 1-3 On-policy VS Off-policy On-policy     The agent learned and the agent interacting with the environment is the same     阿光自已下棋學習 Off-policy     The agent learned and the agent interacting with the environment is different     佐助下棋,阿光在旁邊看 Add a baseline:     It is possible that R is always positive     So R subtract a expectation value Policy in " Policy Gradient" means output action, like left/right/fire gamma-discounted rewards: 時間愈遠的貢獻,降低其權重 Reward Function & Action is defined in prior to training MC v.s. TD MC 蒙弟卡羅: critic after episode end : larger variance(cuz conditions differ a lot in every episode), unbiased (judge until episode end, more fair) TD: Temporal-difference approach: critic during episode :smaller variance, biased maybe atari : a3c  ...