合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        COMP9414代做、代寫Python程序設計

        時間:2024-07-21  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



        COMP9414 24T2
        Artificial Intelligence
        Assignment 2 - Reinforcement Learning
        Due: Week 9, Wednesday, 24 July 2024, 11:55 PM.
        1 Problem context
        Taxi Navigation with Reinforcement Learning: In this assignment,
        you are asked to implement Q-learning and SARSA methods for a taxi nav-
        igation problem. To run your experiments and test your code, you should
        make use of the Gym library1, an open-source Python library for developing
        and comparing reinforcement learning algorithms. You can install Gym on
        your computer simply by using the following command in your command
        prompt:
        pip i n s t a l l gym
        In the taxi navigation problem, there are four designated locations in the
        grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
        episode starts, one taxi starts off at a random square and the passenger is
        at a random location (one of the four specified locations). The taxi drives
        to the passenger’s location, picks up the passenger, drives to the passenger’s
        destination (another one of the four specified locations), and then drops off
        the passenger. Once the passenger is dropped off, the episode ends. To show
        the taxi grid world environment, you can use the following code:
        1https://www.gymlibrary.dev/environments/toy text/taxi/
        1
        env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
        s t a t e = env . r e s e t ( )
        rendered env = env . render ( )
        p r i n t ( rendered env )
        In order to render the environment, there are three modes known as
        “human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
        ronment in a way suitable for human viewing, and the output is a graphical
        window that displays the current state of the environment (see Fig. 1). The
        “rgb array” mode provides the environment’s state as an RGB image, and
        the output is a numpy array representing the RGB image of the environment.
        The “ansi” mode provides a text-based representation of the environment’s
        state, and the output is a string that represents the current state of the
        environment using ASCII characters (see Fig. 2).
        Figure 1: “human” mode presentation for the taxi navigation problem in
        Gym library.
        You are free to choose the presentation mode between “human” and
        “ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
        description, there are six discrete deterministic actions that are presented in
        Table 1.
        For this assignment, you need to implement the Q-learning and SARSA
        algorithms for the taxi navigation environment. The main objective for this
        assignment is for the agent (taxi) to learn how to navigate the gird-world
        and drive the passenger with the minimum possible steps. To accomplish
        the learning task, you should empirically determine hyperparameters, e.g.,
        the learning rate α, exploration parameters (such as ? or T ), and discount
        factor γ for your algorithm. Your agent should be penalized -1 per step it
        2
        Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
        library. Gold represents the taxi location, blue is the pickup location, and
        purple is the drop-off location.
        Table 1: Six possible actions in the taxi navigation environment.
        Action Number of the action
        Move South 0
        Move North 1
        Move East 2
        Move West 3
        Pickup Passenger 4
        Drop off Passenger 5
        takes, receive a +20 reward for delivering the passenger, and incur a -10
        penalty for executing “pickup” and “drop-off” actions illegally. You should
        try different exploration parameters to find the best value for exploration
        and exploitation balance.
        As an outcome, you should plot the accumulated reward per episode and
        the number of steps taken by the agent in each episode for at least 1000
        learning episodes for both the Q-learning and SARSA algorithms. Examples
        of these two plots are shown in Figures 3–6. Please note that the provided
        plots are just examples and, therefore, your plots will not be exactly like the
        provided ones, as the learning parameters will differ for your algorithm.
        After training your algorithm, you should save your Q-values. Based on
        your saved Q-table, your algorithms will be tested on at least 100 random
        grid-world scenarios with the same characteristics as the taxi environment for
        both the Q-learning and SARSA algorithms using the greedy action selection
        3
        Figure 3: Q-learning reward. Figure 4: Q-learning steps.
        Figure 5: SARSA reward. Figure 6: SARSA steps.
        method. Therefore, your Q-table will not be updated during testing for the
        new steps.
        Your code should be able to visualize the trained agent for both the Q-
        learning and SARSA algorithms. This means you should render the “Taxi-
        v3” environment (you can use the “ansi” mode) and run your trained agent
        from a random position. You should present the steps your agent is taking
        and how the reward changes from one state to another. An example of the
        visualized agent is shown in Fig. 7, where only the first six steps of the taxi
        are displayed.
        2 Testing and discussing your code
        As part of the assignment evaluation, your code will be tested by tutors
        along with you in a discussion carried out in the tutorial session in week 10.
        The assignment has a total of 25 marks. The discussion is mandatory and,
        therefore, we will not mark any assignment not discussed with tutors.
        Before your discussion session, you should prepare the necessary code for
        this purpose by loading your Q-table and the “Taxi-v3” environment. You
        should be able to calculate the average number of steps per episode and the
        4
        Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
        algorithm.
        average accumulated reward (for a maximum of 100 steps for each episode)
        for the test episodes (using the greedy action selection method).
        You are expected to propose and build your algorithms for the taxi nav-
        igation task. You will receive marks for each of these subsections as shown
        in Table 2. Except for what has been mentioned in the previous section, it is
        fine if you want to include any other outcome to highlight particular aspects
        when testing and discussing your code with your tutor.
        For both Q-learning and SARSA algorithms, your tutor will consider the
        average accumulated reward and the average taken steps for the test episodes
        in the environment for a maximum of 100 steps for each episode. For your Q-
        learning algorithm, the agent should perform at most 14 steps per episode on
        average and obtain a minimum of 7 average accumulated reward. Numbers
        worse than that will result in a score of 0 marks for that specific section.
        For your SARSA algorithm, the agent should perform at most 15 steps per
        episode on average and obtain a minimum of 5 average accumulated reward.
        Numbers worse than that will result in a score of 0 marks for that specific
        section.
        Finally, you will receive 1 mark for code readability for each task, and
        your tutor will also give you a maximum of 5 marks for each task depending
        on the level of code understanding as follows: 5. Outstanding, 4. Great,
        3. Fair, 2. Low, 1. Deficient, 0. No answer.
        5
        Table 2: Marks for each task.
        Task Marks
        Results obtained from agent learning
        Accumulated rewards and steps per episode plots for Q-learning
        algorithm.
        2 marks
        Accumulated rewards and steps per episode plots for SARSA
        algorithm.
        2 marks
        Results obtained from testing the trained agent
        Average accumulated rewards and average steps per episode for
        Q-learning algorithm.
        2.5 marks
        Average accumulated rewards and average steps per episode for
        SARSA algorithm.
        2.5 marks
        Visualizing the trained agent for Q-learning algorithm. 2 marks
        Visualizing the trained agent for SARSA algorithm. 2 marks
        Code understanding and discussion
        Code readability for Q-learning algorithm 1 mark
        Code readability for SARSA algorithm 1 mark
        Code understanding and discussion for Q-learning algorithm 5 mark
        Code understanding and discussion for SARSA algorithm 5 mark
        Total marks 25 marks
        3 Submitting your assignment
        The assignment must be done individually. You must submit your assignment
        solution by Moodle. This will consist of a single .zip file, including three
        files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
        SARSA (you can choose the format for the Q-tables). Remember your files
        with your Q-tables will be called during your discussion session to run the
        test episodes. Therefore, you should also provide a script in your Python
        code at submission to perform these tests. Additionally, your code should
        include short text descriptions to help markers better understand your code.
        Please be mindful that providing clean and easy-to-read code is a part of
        your assignment.
        Please indicate your full name and your zID at the top of the file as a
        comment. You can submit as many times as you like before the deadline –
        later submissions overwrite earlier ones. After submitting your file a good
        6
        practice is to take a screenshot of it for future reference.
        Late submission penalty: UNSW has a standard late submission
        penalty of 5% per day from your mark, capped at five days from the as-
        sessment deadline, after that students cannot submit the assignment.
        4 Deadline and questions
        Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
        forum on Moodle to ask questions related to the project. We will prioritise
        questions asked in the forum. However, you should not share your code to
        avoid making it public and possible plagiarism. If that’s the case, use the
        course email cs9414@cse.unsw.edu.au as alternative.
        Although we try to answer questions as quickly as possible, we might take
        up to 1 or 2 business days to reply, therefore, last-moment questions might
        not be answered timely.
        For any questions regarding the discussion sessions, please contact directly
        your tutor. You can have access to your tutor email address through Table
        3.
        5 Plagiarism policy
        Your program must be entirely your own work. Plagiarism detection software
        might be used to compare submissions pairwise (including submissions for
        any similar projects from previous years) and serious penalties will be applied,
        particularly in the case of repeat offences.
        Do not copy from others. Do not allow anyone to see your code.
        Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
        require further clarification on this matter.

        請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp





         

        掃一掃在手機打開當前頁
      1. 上一篇:COMP9021代做、代寫python設計程序
      2. 下一篇:COMP6008代做、代寫C/C++,Java程序語言
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
        戴納斯帝壁掛爐全國售后服務電話24小時官網
        菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
        菲斯曼壁掛爐全國統一400售后維修服務電話2
        美的熱水器售后服務技術咨詢電話全國24小時客服熱線
        美的熱水器售后服務技術咨詢電話全國24小時
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
        合肥機場巴士2號線
        合肥機場巴士2號線
      4. 幣安app官網下載 短信驗證碼 丁香花影院

        關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

        Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
        ICP備06013414號-3 公安備 42010502001045

        主站蜘蛛池模板: 东京热无码av一区二区| 无码av免费一区二区三区试看| 国产精品无码一区二区在线观一| 亚洲电影一区二区| 色一情一乱一伦一区二区三区日本 | 精品一区二区三区AV天堂| 无码人妻一区二区三区在线视频| а天堂中文最新一区二区三区| 欧美一区内射最近更新| 国产免费一区二区三区| 亚洲国产精品一区二区久| 搡老熟女老女人一区二区| 性色av一区二区三区夜夜嗨 | 亚洲爆乳精品无码一区二区三区| 国产在线精品一区二区| 国产精品亚洲专区一区 | 中文字幕一区二区人妻性色 | 国产在线无码视频一区二区三区 | 成人免费一区二区无码视频 | 国产精品亚洲产品一区二区三区| 无码人妻av一区二区三区蜜臀| 亚洲国产AV一区二区三区四区| 国产乱码一区二区三区四| 麻豆亚洲av熟女国产一区二| 亚洲一区二区三区播放在线| 国内精品一区二区三区东京| 日本一区二区三区不卡视频| 久久精品日韩一区国产二区| 午夜一区二区免费视频| 国产丝袜一区二区三区在线观看| 鲁丝丝国产一区二区| 男人的天堂亚洲一区二区三区| 无码毛片一区二区三区中文字幕| 高清一区高清二区视频| 国产一区二区三区高清视频 | 国产av福利一区二区三巨| 中文字幕在线观看一区二区| 久久se精品一区二区国产 | 久夜色精品国产一区二区三区| 国精无码欧精品亚洲一区| 国产成人精品久久一区二区三区 |