99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

COMP9414代做、代寫Python程序設(shè)計

時間:2024-07-21  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 24 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:
1https://www.gymlibrary.dev/environments/toy text/taxi/
1
env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 14 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.

請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp





 

掃一掃在手機打開當(dāng)前頁
  • 上一篇:COMP9021代做、代寫python設(shè)計程序
  • 下一篇:COMP6008代做、代寫C/C++,Java程序語言
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務(wù)-企業(yè)/產(chǎn)品研發(fā)/客戶要求/設(shè)計優(yōu)化
    有限元分析 CAE仿真分析服務(wù)-企業(yè)/產(chǎn)品研發(fā)
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計優(yōu)化
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計優(yōu)化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發(fā)動機性能
    挖掘機濾芯提升發(fā)動機性能
    海信羅馬假日洗衣機亮相AWE  復(fù)古美學(xué)與現(xiàn)代科技完美結(jié)合
    海信羅馬假日洗衣機亮相AWE 復(fù)古美學(xué)與現(xiàn)代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
  • 短信驗證碼 trae 豆包網(wǎng)頁版入口 目錄網(wǎng) 排行網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                欧美成人性福生活免费看| 日韩精品三区四区| 亚洲视频香蕉人妖| 久久国产精品99久久久久久老狼| 99国产精品久久久| 日韩综合一区二区| 欧美中文字幕一区| 国产精品高潮久久久久无| 欧美一级精品大片| 日韩精品一级中文字幕精品视频免费观看| 久久久久国产成人精品亚洲午夜 | 久久尤物电影视频在线观看| 日本韩国欧美在线| 国产片一区二区| 2欧美一区二区三区在线观看视频| 色综合色综合色综合色综合色综合| 精品国产区一区| 亚洲午夜av在线| 5月丁香婷婷综合| 91久久线看在观草草青青| 久久国产免费看| 亚洲视频每日更新| 成人永久aaa| 亚洲成av人片观看| 国产精品卡一卡二| 91影院在线免费观看| 日韩高清电影一区| 成人免费视频在线观看| 日本三级韩国三级欧美三级| 欧美日韩高清一区二区| 一区二区三区国产精华| 99久久综合99久久综合网站| 精品捆绑美女sm三区| 99精品国产99久久久久久白柏| 久久久综合九色合综国产精品| 亚洲精品一区二区三区99| 欧美成人video| 国产亚洲一区二区三区| 国产在线精品一区二区不卡了| 91精品国产品国语在线不卡| 亚洲自拍偷拍综合| 一本色道**综合亚洲精品蜜桃冫| 日本乱人伦aⅴ精品| 欧美性淫爽ww久久久久无| 一区二区免费在线播放| 亚洲 欧美综合在线网络| 免费观看30秒视频久久| 麻豆国产一区二区| 国产很黄免费观看久久| 99riav久久精品riav| 欧美日本在线观看| 日本亚洲天堂网| 精品国产青草久久久久福利| 国产精品一区久久久久| 久久精品综合网| 一个色综合av| 91精品福利在线一区二区三区| 久久99最新地址| 精品福利二区三区| 国产99久久久久久免费看农村| 国产精品午夜久久| 色综合久久久久久久久久久| 日韩欧美成人激情| 国产精品一区不卡| 国产精品免费视频观看| 91官网在线观看| 国产精品视频一二| 亚洲最大成人网4388xx| 在线精品亚洲一区二区不卡| 在线播放一区二区三区| 国产乱码精品1区2区3区| 日韩—二三区免费观看av| 成人的网站免费观看| 亚洲精品国产成人久久av盗摄 | 精品免费视频一区二区| 国产精品久久久久永久免费观看| 精品国内片67194| 国产精品女同一区二区三区| 国产麻豆视频精品| 丰满白嫩尤物一区二区| 亚洲人午夜精品天堂一二香蕉| 中文字幕精品一区二区精品绿巨人| 亚洲欧美影音先锋| 亚洲国产你懂的| 亚洲成人1区2区| 色综合色狠狠综合色| 国产精品一卡二| 秋霞成人午夜伦在线观看| 国产精品美女久久久久高潮| 色婷婷精品久久二区二区蜜臂av| 亚洲欧洲国产日本综合| 丝袜美腿亚洲综合| 精品国产青草久久久久福利| 中文字幕一区二区视频| 日日噜噜夜夜狠狠视频欧美人| 午夜精品久久久久久久99樱桃| 国产一区二区三区在线观看免费视频| 色婷婷一区二区| 日韩一区二区在线观看| 日韩精品一级中文字幕精品视频免费观看 | 韩国三级在线一区| 欧美日韩精品二区第二页| 精品av综合导航| 亚洲精品中文在线影院| 91麻豆精品国产91久久久久久久久 | 亚洲欧美自拍偷拍色图| 欧美日韩dvd在线观看| 国产性做久久久久久| 色婷婷综合中文久久一本| 91精品国产91综合久久蜜臀| 亚洲高清久久久| 美女视频黄久久| 国产一区91精品张津瑜| 国产99精品国产| 色婷婷久久久亚洲一区二区三区 | 国产精品素人一区二区| 免费的成人av| 91久久精品一区二区二区| 国产精品综合在线视频| 久久99国产精品久久99| 久久精品国产免费| 国产专区欧美精品| 久久精品久久99精品久久| 日本亚洲欧美天堂免费| 成人午夜激情在线| 国产区在线观看成人精品 | 在线不卡中文字幕| 欧美日韩一区成人| 国产一区中文字幕| 国产精品一二二区| 国产视频一区在线观看| 久久久久久久免费视频了| 欧美性做爰猛烈叫床潮| 亚洲狠狠丁香婷婷综合久久久| 成人免费一区二区三区视频| 欧美一区二区免费观在线| 精品国产乱码久久久久久夜甘婷婷 | 韩国av一区二区三区四区| 亚洲国产精品欧美一二99| 热久久一区二区| 亚洲6080在线| 天天综合网 天天综合色| 亚洲欧美自拍偷拍色图| xnxx国产精品| 日韩一区二区影院| 精品国免费一区二区三区| 蜜臀av一级做a爰片久久| 欧美性猛交xxxxxx富婆| 欧洲精品中文字幕| 欧美色爱综合网| 欧美一区二区成人| 美女一区二区三区| 欧美日韩国产一级片| youjizz久久| 欧美色男人天堂| 国产欧美va欧美不卡在线| 91精品婷婷国产综合久久竹菊| 国产精品国产自产拍高清av | 日韩黄色在线观看| 欧美日韩国产一区| yourporn久久国产精品| 日韩一区二区三区视频| 91精品国产综合久久久蜜臀图片 | 欧美人xxxx| 欧美另类一区二区三区| 欧美精品在线观看一区二区| 国产美女久久久久| 亚洲三级电影网站| 亚洲成av人片一区二区三区| 国产精品影视网| 日韩一级大片在线| 一区二区三区四区五区视频在线观看| 亚洲尤物视频在线| 不卡影院免费观看| 2023国产精品视频| 美女爽到高潮91| 一本色道亚洲精品aⅴ| 国产日韩欧美一区二区三区乱码 | 精品精品欲导航| 国产精品国产三级国产aⅴ中文 | 91麻豆免费在线观看| 日本乱码高清不卡字幕| 欧美日韩你懂得| 亚洲午夜久久久久久久久电影网 | 日韩va欧美va亚洲va久久| 成人免费视频网站在线观看| 欧美一区二区三区免费观看视频 | 亚洲桃色在线一区| 日本免费新一区视频| 制服丝袜亚洲色图| 亚洲妇女屁股眼交7| 国产精品久久一级| 久久精品国产一区二区三| 日韩一区二区三区在线| 蜜乳av一区二区三区| 日韩一级黄色片| 婷婷成人激情在线网| 国产成人h网站| 国产999精品久久久久久绿帽| 国产精品情趣视频|