合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        代做CS 259、Java/c++設計程序代寫
        代做CS 259、Java/c++設計程序代寫

        時間:2024-10-12  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



        Fall 2024 
        CS 259 Lab 1 
        Accelerating Convolutional Neural Network (CNN) on FPGAs using 
        Merlin Compiler 
        Due October 9 11:59pm 
        Description 
        Your task is to accelerate the computation of two layers in a convolutional neural network 
        (CNN) using a high-level synthesis (HLS) tool on an FPGA. We encourage you to start with 
        using the Merlin Compiler. For an input image with 228 × 228 pixels and 256 channels, you 
        are going to calculate the tensor after going through a 2D convolution layer and a 2D max 
        pooling layer. The convolution layer has 256 filters of shape 256 × 5 × 5, uses the ReLU 
        activation relu(x) = max{x, 0} with a bias value for each output channel. The 2D maxpooling
         layer operates on 2 × 2 non-overlapping windows. You will need to implement this 
        function using HLS: 
        void CnnKernel(const float* input, const float* weight, const float* bias, float* 
        output) 
        where input is the input image of size [256][228][228], weight stores the weights of the 
        convolution filters of size [256][256][5][5], bias stores the offset values of size [256] that 
        will be added to the output channels, and output should be written to by you as defined 
        above to store the result of maxpool(relu(conv2d(input, weight) + bias)). The output 
        size is [256][112][112]. 
        How-To 
        FPGA accelerator compilation typically involves three (3) stages: high-level synthesis (HLS), 
        bitstream generation, and onboard execution. The last two stages can take days to 
        complete. Therefore, in this lab, we only focus on the first stage: HLS. Your performance will 
        only be assessed using the estimation in the HLS reports, which is usually accurate. 
        However, you are welcome to try out the last two steps if you are interested. 
         
         
         
        Connecting to the Server: Method 1 
        In this method, you won’t be able to run Merlin directly from your /home directory, so you’ll need 
        to copy files back and forth. 
        1. Connect to the server (VPN may be required). You can find VPN details here: 
        https://www.it.ucla.edu/it-support-center/services/virtual-private-network-vpn-clients  
        ssh <username>@brimstone.cs.ucla.edu 
         
        2. Start the Docker container and share your home with –v: 
         
        docker run -v /d0/class/:/home -it vitis2021 /bin/bash 
         
        3. Source Vitis, navigate to the desired directory and clone the repository: 
         
        source /tools/Xilinx/Vitis_HLS/2021.1/settings64.sh 
        cd /opt 
        git clone https://github.com/UCLA-VAST/cs-259-f24.git 
        cd cs-259-f24/lab1 
         
        4. Copy the necessary file to your home directory: 
         
        cp /opt/cs-259-f24/lab1/cnn-krnl.cpp /home/<username> 
        Connecting to the Server: Method 2 
        In this method, you can run Merlin directly from your /home directory, but make sure to export your 
        home directory. 
         
        1. Connect to the server (VPN may be required). You can find VPN details here: 
        https://www.it.ucla.edu/it-support-center/services/virtual-private-network-vpn-clients 
         
        ssh <username>@brimstone.cs.ucla.edu 
         
        2. Start the Docker container and share your home with –v: 
         
        docker run --user $(id -u):100 -v /d0/class/:/home -it vitis2021 /bin/bash 
         
        3. Export your home directory: 
         
        export HOME=/home/<username> 
         
        4. Source Vitis, navigate to your home directory and clone the repository: 
         
        source /tools/Xilinx/Vitis_HLS/2021.1/settings64.sh 
        cd /home/<username> 
        git clone https://github.com/UCLA-VAST/cs-259-f24.git 
        cd cs-259-f24/lab1 
        Build and Run Baseline with Software Simulation 
        We have prepared the starter kit for you. Please run: make 
        This command will perform a software simulation of the provided starter FPGA HLS kernel. It 
        should show “PASS”. You need to use FPGA Developer AMI in this lab unless you are using 
        a computer with Xilinx Vitis HLS installation. However, you are still suggested to develop code 
        and run software simulation locally to test the correctness. You can move to AWS once you 
        enter the tuning stage. 
        Understand the automatic Merlin’s optimization 
        Before modifying the kernel and adding pragmas, synthesize the CNN kernel with Merlin and 
        describe in your report the automatic optimizations made by Merlin and how this reduces 
        latency. 
        Modify the HLS CNN Kernel 
        If you have successfully built and run the baseline HLS CNN kernel, you can now optimize 
        the code to design your CNN kernel. Your task is to implement a fast, parallel version of the 
        CNN kernel on FPGA. You should start with the provided starter kit. You should edit cnnkrnl.cpp
        for this task. When editing, please use the given types input_t, weight_t, bias_t, 
        and output_t for the corresponding data, and compute_t for your intermediate values. 
        You can use them as if they are float numbers. 
        Parallelism should be exploited by using Merlin pragmas and tiling. You are encouraged to 
        focus on Merlin pragmas (#pragma ACCEL parallel, #pragma ACCEL pipeline and #pragma 
        ACCEL tile). You can explicitly modify the code (tiling, loop permutation, …) but make sure 
        the code modified is correct. 
        In the starter kit, we simply wrap a sequential CNN code with #pragma ACCEL kernel, and 
        Merlin automatically performs data caching, memory coalescing, pipelining and 
        parallelization, which yield about 10 GFLOPs. 
        Although the skeleton kernel is provided, you are also free to create your own by removing 
        the header file inclusion of “lib/cnn-krnl.h” and implement the basic kernel from scratch. 
        However, this would require specific expertise in Xilinx FPGA architecture and is not 
        recommended for this course. 
        Test Your HLS CNN Kernel with Software Simulation 
        To perform software emulation of your FPGA implementation of CNN kernel: 
        make 
        If you see something similar to the following message, your implementation is incorrect. 
        Found 21201** errors 
        FAIL Since the software simulation step uses the CPU to emulate the hardware behavior, it only 
        serves as correctness test and its execution time doesn’t reflect that of actual hardware. Your 
        estimated execution time should be retrieved using the command below: 
        make estimate 
        This command will print out the estimated latency and resource usage of your kernel: 
        +---------------------------+------------------------+----------+----------+---------+--------+-------+------+ 
        | Kernel | Cycles | LUT | FF | BRAM | DSP | URAM |Detail| 
        +---------------------------+------------------------+----------+----------+---------+--------+-------+------+ 
        |CnnKernel (cnn-krnl.cpp:12)|4179564052 (16718.256ms)|49558 (4%)|49381 (2%)|810 (18%)|202 (2%)|25 (2%)|- | 
        +---------------------------+------------------------+----------+----------+---------+--------+-------+------+ 
        The time highlighted in yellow is the estimated execution time of your FPGA kernel. You can 
        get the performance by “kNum*kNum*kImSize*kImSize*kKernel*kKernel*2/latency”, or 
        164.4/latency (in s) to get the performance in GFLOPS. 
        IMPORTANT: Please make sure that all your loops have fixed loop bounds. If any of the loop 
        bounds are variable, a performance estimation will not be shown and you will receive no 
        performance grade. 
        IMPORTANT: The “make estimate” command should finish in 30 minutes, or in two hours 
        with highly-complex optimizations. Our recommendation is to halt your estimation using 
        Ctrl-C when the time exceeds 30 minutes, except for your last step (after you reach ~100 
        GOPS). More than 12 hours in the estimation will result in zero for the performance score. 
        As your kernel design becomes more complex, the software simulation and the estimation 
        will start to take a significantly longer time. 
        IMPORTANT: As you apply more optimizations, your resource usage will also increase. 
        Ideally, you should keep applying optimization until your kernel occupies about 80% of these 
        resources. The remaining 20% should be reserved for the interfaces (DRAM/PCI-e controller) 
        and the downstream flows. Please make sure that resource utilization is less than 80% for all 
        FPGA resources. If any of the resources are over this limit, you will receive no performance 
        grade. 
        IMPORTANT: You can check the HLS report by opening merlin.rpt with a text editor. This 
        file will be generated with the command make estimate. You must submit this file with your 
        final submission. You should not modify this file in your submission, and it will be all verified 
        after submission due. Any modification to this file in your submission constitutes academic 
        misconduct and will be reported. 
        Advanced Tips for HLS 
        Kernel Profiling: If you want to “profile” your kernel, you can open merlin.rpt with a text 
        editor and scroll down to Performance Estimate. You can see the trip count, accumulated 
        cycles and cycles per call, as well as pipeline initiation interval and parallel factor for each 
        loop in the table. For resource usage, you can go to Resource Estimate. No loop level 
        information is available, though. If you want to check the resource usage of a code region, 
        you can wrap it with a function then run again. 
        Kernel after transformation: If you want to see the kernel after being transformed by Merlin, 
        you can look for that in .merlin_prj/run/implement/exec/hls/kernel. Annotation for Profiling: If you find the loops in your report hard to read, you can name the 
        loops you are interested in using a goto label. For example, this_loop: for (int i = 0; 
        i < n; i++); 
        Debugging Pipelining: If you are not sure about why you cannot achieve a specific initiation 
        interval as you expected, you can open the file below and read the logs. HLS usually gives out 
        a reason. 
        .merlin_prj/run/implement/exec/hls/_x/logs/CnnKernel/CnnKernel/vitis_hls.log 
        Long Synthesis Time In Pipelining: You will experience long HLS synthesis time (for 
        generating the estimation) if you pipeline a loop with a large loop body. Besides, please note 
        that as all loops inside a pipeline will be unrolled, it may be automatically a large loop body. 
        In this case, you may want to exchange the order of pipelining and unrolling and see if the time 
        can get improved. 
        Use Functions for Shorter Synthesis Time: If you experience long synthesis time, you may try 
        wrapping some loops into a function and specify #pragma HLS inline off inside the 
        function body. However, this may lead to inaccurate dependency analysis or memory port 
        analysis and cause lower performance sometimes. There might be some workarounds, or 
        not. For example, if you have access to A[k + i][j] inside the function, passing A + k to 
        the function and accessing A’[i][j] can allow HLS to understand the array partitioning 
        better than passing A. You need to do experiments. 
        General Tips 
        ● When you develop on AWS, to resume a session in case you lose your connection, you 
        can run screen after login. You can recover your session with screen -DRR. You should 
        stop your AWS instance if you are going to come back and resume your work in a few 
        hours or days. Your data will be preserved but you will be charged for the EBS storage 
        for $0.10 per GB per month (with default settings). You should terminate your instance 
        if you are not going to come back and resume your work. Data on the instance will be 
        lost. 
        ● You are recommended to use private repositories provided by GitHub to backup your 
        code. Never put your code in a public repo to avoid potential plagiarism. To check in 
        your code to a private GitHub repo, create a repo first. 
        git branch -m upstream 
        git checkout -b main # skip these two lines if you are reusing the folder in Lab 1 
        ... // your modifications 
        git add cnn-krnl.cpp merlin.rpt 
        git commit -m "lab1: first version" # change commit message accordingly 
        # please replace the URL with your own URL 
        git remote add origin git@github.com:YourGitHubUserName/your-repo-name.git 
        git push -u origin main 
        ● You are recommended to git add and git commit often so that you can keep track of 
        the history and revert whenever necessary. 
        ● Make sure your code produces correct results! 
        (Optional) Modify the HLS CNN Kernel using Vitis Pragmas 
        You are encouraged to use mainly Merlin pragmas. If needed, you can use Vitis pragmas for 
        finer-grained control and optimization. The list of pragmas in Vitis can be found here. You can simply write Vitis pragmas and Merlin pragmas in the same file (cnn-krnl.cpp), but note 
        that, to apply an HLS pragma to a loop, you need to put the pragma inside the loop body 
        instead of before it. 
        Submission 
        You need to report the estimated performance results of your FPGA-based implementation on 
        a Xilinx Ultrascale+ VU9P FPGA (the FPGA we are using, specified in the makefile). Please 
        express your performance in GFLOPS and the speedup compared with the starter-kit version. 
        Your report should also include: 
        ● Please run the input C file through the Merlin Compiler, identify the code 
        transformation and HLS pragmas that Merlin added, and discuss why. 
        ● Please explain the parallelization and optimization strategies you have applied for 
        each loop in the CNN program (convolution, max pooling, etc) in this lab. Include the 
        pragmas (if any) or code segments you have added to achieve your strategy. 
        ● Please incrementally evaluate each parallelization/optimization that you have applied 
        and explain why it improves the performance. 
        ● Please report the FPGA resources (LUT/FF/DSP/BRAM) usages, in terms of resource 
        count and percentage of the total. Which resource has been used most, in terms of 
        percentage? 
        ● Optional: The challenges you faced, and how you overcame them. 
        ● (Bonus +5pts): Analyze your code and check if the DSP/BRAM resource usage 
        matches your expectation. Only the adders, multipliers, and size of arrays need to be 
        considered. Please attach related code segments to your report and show how you 
        computed the expected number. Provide a discussion on possible reasons if they 
        differ significantly. 
        You also need to submit your optimized kernel code. Do not modify code in the lib directory. 
        Please submit on Gradescope. Your final submission should contain and only contain these 
        files individually: 
         ├ cnn-krnl.cpp 
         ├ merlin.rpt 
         └ lab**report.pdf 
        File lab**report.pdf must be in PDF format. 
        Grading Policy 
        Your submission will only be graded if it complies with the formatting requirements. 
        Missing reports/code or compilation errors will result in 0 for the corresponding 
        category(ies). 
        Correctness (40%) 
        Please check the correctness using the command “make”. Performance (40%) 
        Your performance will be evaluated based on the estimation report generated using the 
        command “make estimate”. The performance point will be added only if you have the 
        correct result, so please prioritize the correctness over performance. Your performance will 
        be evaluated based on the ranges of throughput (GOPS). Ranges A+ and A++ will be defined 
        after all the submissions are graded: 
        ● Range A++, better than Range A+ performance: 40 points + 20 points (bonus) 
        ● Range A+, better than Range A performance: 40 points + 10 points (bonus) 
        ● Range A GFLOPS [200, 280]: 40 points 
        ● Range B GFLOPS [120, 200): 30 points 
        ● Range C GFLOPS [60, 120): 20 points 
        ● Range D GFLOPS [30, 60): 10 points 
        ● Lower than range F [0, 30): 0 points 
         
        Report (20%) 
        Points may be deducted if your report misses any of the sections described above. 
        Academic Integrity 
        All work is to be done individually, and any sources of help are to be explicitly cited. You must 
        not modify the HLS report merlin.rpt in your submission. Any instance of academic 
        dishonesty will be promptly reported to the Office of the Dean of Students. Academic 
        dishonesty includes, but is not limited to, cheating, fabrication, plagiarism, copying code from 
        other students or from the internet, modifying the software-generated report, or facilitating 
        academic misconduct. We’ll use automated software to identify similar sections between 
        different student programming assignments, against previous students’ code, or against 
        Internet sources. We’ll run HLS on all submissions and compare the reproduced HLS 
        report with the submitted report. Students are not allowed to post the lab solutions on public 
        websites (including GitHub). Please note that any version of your submission must be your 
        own work and will be compared with sources for plagiarism detection. 
        Late policy: Late submission will be accepted for 24 hours with a 10% penalty. No late 
        submission will be accepted after that (you lost all points after the late submission time). 

        請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









         

        掃一掃在手機打開當前頁
      1. 上一篇:代寫ECE4016、Python設計編程代做
      2. 下一篇:DDA3020代做、代寫Python語言編程
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
        戴納斯帝壁掛爐全國售后服務電話24小時官網
        菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
        菲斯曼壁掛爐全國統一400售后維修服務電話2
        美的熱水器售后服務技術咨詢電話全國24小時客服熱線
        美的熱水器售后服務技術咨詢電話全國24小時
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
        合肥機場巴士2號線
        合肥機場巴士2號線
      4. 幣安app官網下載 短信驗證碼

        關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

        Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
        ICP備06013414號-3 公安備 42010502001045

        主站蜘蛛池模板: 中文字幕一区二区三区视频在线| 中文字幕一区日韩精品| 国产精品制服丝袜一区| 欧洲精品码一区二区三区| 亚洲AV无码一区二区三区人| 视频在线观看一区二区| 日韩AV无码久久一区二区| 国产一区视频在线免费观看| 亚洲av成人一区二区三区观看在线 | 国产AV国片精品一区二区| 亚洲福利电影一区二区?| 四虎成人精品一区二区免费网站 | 69久久精品无码一区二区| 成人精品一区二区三区电影| 在线视频一区二区三区三区不卡 | 国内精品视频一区二区三区| 偷拍激情视频一区二区三区| 一区二区免费国产在线观看| 国产激情一区二区三区小说| 日韩精品无码一区二区三区AV| 人妻无码第一区二区三区| 亚欧在线精品免费观看一区 | 中文人妻无码一区二区三区| 国产在线一区视频| 日韩精品视频一区二区三区| 日本一区二区三区高清| 国产一区二区三区在线观看免费| 国内偷窥一区二区三区视频| 亚洲综合激情五月色一区| 丝袜人妻一区二区三区| 台湾无码AV一区二区三区| 上原亚衣一区二区在线观看| 国产精品一区二区久久精品| 日本精品夜色视频一区二区| 视频在线一区二区三区| 精品福利一区二区三区精品国产第一国产综合精品 | 无码国产精品一区二区免费式影视| 精品人体无码一区二区三区| 亚洲AV本道一区二区三区四区| 国产精品亚洲一区二区麻豆| 一区二区三区电影网|