合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        ECE 498代寫、代做Python設計編程
        ECE 498代寫、代做Python設計編程

        時間:2024-11-15  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



        ECE 498/598 Fall 2024, Homeworks 3 and 4
        Remarks:
        1. HW3&4: You can reduce the context length to ** if you are having trouble with the
        training time.
        2. HW3&4: During test evaluation, note that positional encodings for unseen/long
        context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
        work well.
        3. HW3&4: Comments are an important component of the HW grade. You are expected
        to explain the experimental findings. If you don’t provide technically meaningful
        comments, you might receive a lower score even if your code and experiments are
        accurate.
        4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
        November 18th at 11:59 PM. For each assignment, please submit both your code and a
        PDF report that includes your results (figures) for each question. You can generate the
        PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
        cells.
        1
        The objective of this assignment is comparing transformer architecture and SSM-type
        architectures (specifically Mamba [1]) on the associative recall problem. We provided an
        example code recall.ipynb which provides an example implementation using 2 layer
        transformer. You will adapt this code to incorporate different positional encodings, use
        Mamba layers, or modify dataset generation.
        Background: As you recall from the class, associative recall (AR) assesses two abilities
        of the model: Ability to locate relevant information and retrieve the context around that
        information. AR task can be understood via the following question: Given input prompt
        X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
        and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
        retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
        To proceed, let us formally define the associative recall task we will study in the HW.
        Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
        query q appears exactly twice in the sequence and the value v follows the first appearance
        of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
        Induction head is a special case of the definition above where the query q is fixed (i.e. Q
        is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
        model to solve AR for all queries in the vocabulary.
        Problem Setting
        Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
        the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
        normalized its rows to unit length. Here d is the embedding dimension. The embedding of
        the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
        Experimental variables: Finally, for the AR task, Q will simply be the first M elements
        of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
        also play with two other variables:
        • Context length: We will train these models up to context length L. However, we
        will evaluate with up to 3L. This is to test the generalization of the model to unseen
        lengths.
        • Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
        introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
        Models: The motivation behind this HW is reproducing the results in the Mamba paper.
        However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
        evaluations:
        2
        Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
        model to retrieve the value associated with all queries whereas the induction head requires
        the same for a specific query. Thus, the latter is an easier problem. The figure above is
        directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
        this homework.
        • Transformer: We will use the transformer architecture with 2 attention layers (no
        MLP). We will try the following positional encodings: (i) learned PE (provided code),
        (ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
        • Mamba: We will use the Mamba architecture with 2 layers.
        • Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
        No positional encoding is used.
        Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
        benefit of starting the model with a Mamba layer. You should use public GitHub repos to
        find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
        this GitHub Repo for the Mamba model.
        Generating training dataset: During training, you train with minibatch SGD (e.g. with
        batch size 64) until satisfactory convergence. You can generate the training sequences for
        AR as follows given (K, d, M, L, τ):
        1. Training sequence length is equal to L.
        2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
        that size of Q is |Q| = M.
        3. Place q at the end of the sequence and place another q at an index i chosen uniformly
        at random from 1 to L − τ.
        4. Place value token at the index i + τ.
        3
        5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
        but are not equal to q.
        6. Set label token Y = v.
        Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
        lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
        Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
        a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
        is the only model that exhibits length generalization, that is, even if you train it pu to context
        length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
        is inherently a recurrent model, it may not solve the AR problem in its full generality. This
        motivates the question: What are the tradeoffs between Mamba and transformer, and can
        hybrid models help improve performance over both?
        Your assignments are as follows. For each problem, make sure to return the associated
        code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
        Grading structure:
        • Problem 1 will count as your HW3 grade. This only involves Induction Head
        experiments (i.e. M = 1).
        • Problems 2 and 3 will count as your HW4 grade.
        • You will make a single submission.
        Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
        • Train all models on the induction heads problem (M = 1, τ = 1). After training,
        evaluate the test performance and plot the accuracy of all models as a function of
        the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
        (3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
        performance of the models including length generalization ability.
        • Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
        • Which models converge faster during training? Provide a plot of the convergence rate
        where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
        test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
        Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
        with RoPE, and Hybrid. Set τ = 1 (standard AR).
        • Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
        query). Comment on the results.
        • Train Transformer models for M = 4, 8, 16. Comment on the results and compare
        them against Mamba’s behavior.
        4
        • Train the Hybrid model for M = 4, 8, 16. Comment and compare.
        Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
        Mamba models.
        • Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
        corresponding results of Problem 2. How does embedding d impact results?
        • Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




        請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






         

        掃一掃在手機打開當前頁
      1. 上一篇:IEMS5731代做、代寫java設計編程
      2. 下一篇:ENGG1110代做、R編程語言代寫
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
        戴納斯帝壁掛爐全國售后服務電話24小時官網
        菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
        菲斯曼壁掛爐全國統一400售后維修服務電話2
        美的熱水器售后服務技術咨詢電話全國24小時客服熱線
        美的熱水器售后服務技術咨詢電話全國24小時
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
        合肥機場巴士2號線
        合肥機場巴士2號線
      4. 幣安app官網下載 短信驗證碼 丁香花影院

        關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

        Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
        ICP備06013414號-3 公安備 42010502001045

        主站蜘蛛池模板: 亚洲一区免费在线观看| 国产伦精品一区三区视频| 国产美女精品一区二区三区| 国产福利电影一区二区三区久久久久成人精品综合 | 日韩视频一区二区在线观看| 国产一区二区在线观看视频| 国产精品第一区第27页| 久久国产精品视频一区| 天堂va视频一区二区| 亚洲国产精品一区二区九九| 亚洲线精品一区二区三区| 国产乱码一区二区三区| 亚洲av成人一区二区三区在线观看| 一区二区三区人妻无码| 天堂一区人妻无码| 中文字幕av人妻少妇一区二区| 精品乱人伦一区二区| 一区二区三区视频在线观看| 日本精品一区二区三区视频| 亚洲线精品一区二区三区影音先锋| 日本一区二区三区爆乳| 精品一区二区三区色花堂| 97久久精品无码一区二区天美| 亚洲免费一区二区| 亚洲色偷精品一区二区三区| 内射女校花一区二区三区| 成人久久精品一区二区三区| 国产精品高清一区二区三区| 人妻夜夜爽天天爽一区| 国产午夜精品一区二区三区漫画| 久久精品无码一区二区三区免费| 国产在线一区二区在线视频| 日韩一区二区三区不卡视频| 久久精品国产一区二区三区肥胖| 波多野结衣一区二区三区高清在线| 麻豆AV一区二区三区| 国产一区三区三区| 日韩A无码AV一区二区三区| 日韩免费一区二区三区在线| 无码乱人伦一区二区亚洲一| 国产AⅤ精品一区二区三区久久|