合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        代寫CS257、c/c++編程設計代做

        時間:2024-02-29  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



        CS257 Advanced Computer Architecture
        Coursework Assignment
        Term 2, 2023/24
        Contents
        1 Introduction 2
        2 Submission 2
        3 Introduction to ACACGS 3
        4 Compiling and Running the Code 4
        4.1 Visualisation Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
        5 Hardware Details 6
        6 How will my code be tested for performance? 7
        7 Rules 7
        8 Where do I start? 7
        9 Instructions for Submission 7
        10 Support 7
        1
        1 Introduction
        The purpose of this coursework is to give you some hands-on experience in code optimisation. By the time you read
        this, you will have encountered a variety of code optimisation techniques including loop unrolling and vectorisation.
        2 Submission
        Your submission will consist of two parts:
        1. Optimised Code (60%)
        A piece of C code based on the initial implementation provided. This C code will be assessed with respect
        to your selection and understanding of optimisations, functional correctness, i.e., producing the right answer,
        and execution speed.
        2. Written Report (40%)
        A report (4 pages maximum, excluding references) detailing your design and implementation decisions. Your
        report will be evaluated with respect to your understanding of code optimisation techniques and the optimisations you attempted. This means that your report should explain:
        (a) which optimisations you did and did not use;
        (b) why your chosen optimisations improve performance; and
        (c) how your chosen optimisations affect floating-point correctness.
        Given that you may apply many different optimisations, a sensible approach is to build your solution incrementally, saving each partial solution and documenting the impact of each optimisation you make. This means that it
        is in your interest to attempt as many different optimisations or combinations of optimisations as you can.
        You may discuss optimisation techniques with others but you are not allowed to collaborate on solutions to this
        assignment. Please remember that the University takes all forms of plagiarism seriously.
        2
        3 Introduction to ACACGS
        ACSCGS is a conjugate gradient proxy application for a 3D mesh. The simulation will execute for either a fixed
        number of timesteps or alternatively until the residual value falls below a given threshold. This is done for a given
        mesh size, which is passed in at runtime through command-line arguments.
        In this proxy application, a force is applied to each edge boundary of the cuboid, which is then propagated
        throughout the mesh. As each time step passes, the force is dissipated within the mesh, until the amount of residual
        is significantly small that the simulation stops (as there are no more calculations to perform), or a set number of
        time steps have passed.
        In addition to providing numeric solutions, the code can also generate visuals which depict the pressure within
        the mesh throughout the simulation run. Creating the visualisations relies on two optional packages, Silo and VisIt,
        which are available on the DCS systems.
        Figure 1: Pressure Matrix Visualisation
        3
        4 Compiling and Running the Code
        The code includes a Makefile file to build the program. You can compile all of the code using the command make.
        You should not modify the Makefile file, but examining it may prove helpful in some situations.
        While the DCS machines do include a version of gcc, it is preferable to use a more recent version. On the DCS
        systems, you can make version 9 the default by using the module load gcc9 command. Once this is loaded you
        can simply type make to build the code, which will create an executable named acacgs in the directory. To clean
        up the directory, you can run make clean.
        To run the code, you need to provide the three dimensions for the mesh as three parameters to the executable.
        For example to execute the provided code on a small 10x10x10 mesh you would enter ./acacgs 10 10 10. On my
        system the output for the code is below. This information is also stored in a file, which is named after the wallclock
        date and time of when the program was first executed (for example, 2023_01_26_12_00_00.txt).
        ===== Final Statistics =====
        Executable name: ./acacgs
        Dimensions: 10 10 10
        Number of iterations: 149
        Final residual: 2.226719e-92
        === Time ==
        Total: 1.126600e-02 seconds
        ddot Kernel: 8.3**000e-04 seconds
        waxpby Kernel: 1.087000e-03 seconds
        sparsemv Kernel: 9.123000e-03 seconds
        === FLOP ==
        Total: 9.536000e+06 floating point operations
        ddot Kernel: 5.960000e+05 floating point operations
        waxpby Kernel: 8.940000e+05 floating point operations
        sparsemv Kernel: 8.046000e+06 floating point operations
        === MFLOP/s ==
        Total: 8.464**e+02 MFLOP/s
        ddot Kernel: 7.103695e+02 MFLOP/s
        waxpby Kernel: 8.224**1e+02 MFLOP/s
        sparsemv Kernel: 8.819467e+02 MFLOP/s
        Difference between computed and exact = 1.110223e-15
        You will find more detailed instructions to build the code in the README.md file, including flags to turn on
        verbose mode, which will output details for each timestep in the simulation, and flags for enabling visualisation.
        4.1 Visualisation Generation
        To enable visualisation outputs, you must build your code using make SILO=1. This will then compile your code
        in a way which produces files suitable for visualisation in VisIt. If you are working remotely and want to visualise
        the coursework, it will be quicker and easier for you to copy the files to your local machine, then utilise VisIt on
        the local machine to visualise the cuboid. Before you make the program, make sure you load the SILO module
        (module load cs257-silo).
        When the program is ran with visualisations, each timestep will produce a SILO file within a directory named
        after the wallclock date and time (for example: 2023_01_26_12_00_00). In this directory will be a collection of
        .silo files, each named outputXXXX.silo, where XXXX represents the timestep it relates to.
        Once the program has finished, these can be utilised in Visit. To do so, load the VisIt module (module load
        cs257-visit) and open VisIt using the command visit. From here, you will get 2 windows. The smaller, skinner
        one is the control window and is used to manage everything that will be displayed. The larger window is the display
        window. In the control window, select Open, and navigate to the directory with the SILO files. You should then
        be able to select these SILO files.
        4
        Now that the SILO files have been loaded, we can now draw some given variables. To do this, click on the Add
        and select a mode and a variable that should be viewed. One of the nicest ones to use is Volume and either x_nodal
        or p_nodal. When you have finished adding elements, click on Draw. This will generate an image in the display
        window, that can be dragged around so that the cuboid can be viewed from different angles. The control window
        has a play button, which will run through each timestep.
        Visualisations are nice to have, but for performance purposes we turn them off as they write a significant amount
        of data to disk.
        Table 1: Visualisation Data File Sizes
        x y z Cells Approximate Data Size
        10 10 10 1000 4MB
        25 25 25 15,625 39MB
        50 50 50 125,000 301MB
        100 100 100 1,000,000 2.4GB
        200 200 200 8,000,000 19.3GB
        There is the potential to go significantly over your DCS disk quota with large meshes. I recommend that you
        do not exceed 30x30x30 for producing visualisations on the DCS machines. If you are developing your solution on
        your personal machine then you may wish to produce larger visualisations.
        5
        5 Hardware Details
        On a Linux system, you can read the processor information using the command cat /proc/cpuinfo or lscpu.
        This will provide full details on the CPU in the machine, including the CPU model, number of cores, the clock
        frequency and supported extensions. I strongly recommend taking a look at this on your development machine.
        For the purposes of assessment, your code will be run on a DCS machine with 4 cores. The output from lscpu
        can be seen below:
        Architecture: x86_64
        CPU op-mode(s): **-bit, 64-bit
        Byte Order: Little Endian
        CPU(s): 4
        On-line CPU(s) list: 0-3
        Thread(s) per core: 1
        Core(s) per socket: 4
        Socket(s): 1
        NUMA node(s): 1
        Vendor ID: GenuineIntel
        CPU family: 6
        Model: 158
        Model name: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
        Stepping: 9
        CPU MHz: 3400.000
        CPU max MHz: 3800.0000
        CPU min MHz: 800.0000
        BogoMIPS: 6816.00
        Virtualization: VT-x
        L1d cache: **K
        L1i cache: **K
        L2 cache: 256K
        L3 cache: 6144K
        NUMA node0 CPU(s): 0-3
        Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
        clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm
        constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid
        aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
        sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
        xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd
        ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1
        avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec
        xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear
        flush_l1d arch_capabilities
        Machines matching this specification are available in the cs257 queue of the Batch Compute System in the
        Department (referred to as kudu in the labs). You will learn how to use this system during the lab sessions, so
        there will be time to get used to it.
        6
        6 How will my code be tested for performance?
        Your submission will be tested on a range of input sizes to evaluate how robust your performance improvements
        are. It is recommended that you try testing your solution on inputs that are not cubes to see if there are any
        weaknesses in your optimisation strategies. The 7-pt stencil option will not be used for testing your code.
        Your code will be executed five times for each problem size on the target hardware. The highest and lowest
        runtimes will be discarded, and the mean of the three remaining values will be taken as your runtime for that
        problem size.
        7 Rules
        Your submitted solution must:
        • Compile on the DCS workstations.
        Your submitted solution must not:
        • Alter the Makefile or add or edit any compiler flags;
        • Use instruction sets not supported by the DCS machines;
        • Require additional hardware e.g., GPUs;
        • Add relaxed math options to the compile line, e.g., -ffast-math. Note: Manual use of approximate math
        functions is acceptable.
        8 Where do I start?
        This can seem like a daunting project, but we can break it down into a number of steps.
        1. Compile and run the code as provided. This is a quick easy check to make sure your environment is setup
        correctly.
        2. Read the code. Start in main.c and follow it through. The functions are well documented with Doxygen
        comments. Don’t panic - you are not expected to understand the physics in the code.
        3. Measure the runtime of the code for reference purposes.
        4. Figure our where the most intensive sections of code are.
        5. Develop a small optimisation.
        6. Run the code and review the impact of your changes.
        7. Repeat steps 5 and 6 until you have exhausted your performance ideas.
        9 Instructions for Submission
        Your solution should be submitted using Tabula. Please ensure that your code works on DCS machines prior to
        submission.
        Submission Deadline: Wednesday 20th March 2024 @ 12 Noon
        Files Required: A single file named coursework.zip which should contain all of your code at the top-level (i.e.
        no subdirectories) and the report file as a PDF. All files should be submitted through Tabula.
        10 Support
        Support can be found from one of your Teaching Assistants: Stephen Xu (stephen.xu@warwick.ac.uk), James
        Macer-Wright james.macer-wright@warwick.ac.uk or the module organiser via email.
        請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

        掃一掃在手機打開當前頁
      1. 上一篇:莆田鞋在哪個app買(莆田鞋十大良心商家推薦)
      2. 下一篇:代寫CS-256、代做Java編程設計
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
        戴納斯帝壁掛爐全國售后服務電話24小時官網
        菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
        菲斯曼壁掛爐全國統一400售后維修服務電話2
        美的熱水器售后服務技術咨詢電話全國24小時客服熱線
        美的熱水器售后服務技術咨詢電話全國24小時
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
        合肥機場巴士2號線
        合肥機場巴士2號線
      4. 幣安app官網下載 短信驗證碼

        關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

        Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
        ICP備06013414號-3 公安備 42010502001045

        主站蜘蛛池模板: 亚洲人成网站18禁止一区| 国产A∨国片精品一区二区| 国产在线观看一区二区三区| 日本一区二区三区不卡视频中文字幕 | 国产一区二区三区在线观看免费 | 亚洲毛片不卡av在线播放一区| 2018高清国产一区二区三区| 一区二区三区四区精品视频| 日韩精品无码一区二区三区AV| 一区二区三区美女视频| 精品国产一区二区三区久| 日韩在线一区二区| 一色一伦一区二区三区| 人妖在线精品一区二区三区| 麻豆文化传媒精品一区二区| 一区二区视频在线观看| 亚洲男女一区二区三区| 国产成人精品一区二区秒拍| 国产亚洲一区二区精品| 国产在线精品一区二区不卡| 精品国产亚洲一区二区三区| 伊人久久精品无码av一区| 国产视频一区在线观看| 国产精品无码AV一区二区三区| 亚洲夜夜欢A∨一区二区三区| 国产情侣一区二区三区| 免费观看一区二区三区| 亚洲线精品一区二区三区| 日韩精品在线一区二区| 亚洲电影一区二区| 国产精品综合AV一区二区国产馆 | 国产成人高清视频一区二区| 亚洲AV成人精品日韩一区18p| 人妻免费一区二区三区最新| 日韩精品成人一区二区三区| 无码人妻精品一区二区三区99性| 中文字幕日韩一区二区不卡| 在线观看一区二区三区视频| 一区二区三区91| 无码人妻精品一区二区三区东京热 | 亚洲av无码不卡一区二区三区|