合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        代寫COMP26120、代做C++, Java/Python編程

        時間:2023-12-08  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



        Academic Session: 202**4
        Lab Exercise 3: Spellchecking (Better Trade-offs)
        Duration: 4 weeks
        You should do all your work in the lab3 directory of the COMP26120 2023 repository - see Blackboard
        for further details. You will need to make use of the existing code in the branch as a starting point.
        Important: You submit this lab via a quiz on Blackboard. This will:
        1. Ask you some questions about your implementation, including the hash of the commit you want
        us to mark (see below for details of this).
        2. Ask you to upload a zip file of the python, c or java folder you have been working in.
        3. Ask you to upload a PDF report of the experiments you will run in Part C of this lab.
        4. Ask you for a statement on whether and (if relevant) how you used generative AI tools in your
        work. If you used generative AI you also have to upload a file containing transcripts of your
        usage.
        You can save your answers and submit later so we recommend filling in the questions for each part as
        you complete it, rather than entering everything at once at the end.
        • You MUST fill in and submit the quiz (at least some of it) to get marked.
        • Reports MUST be uploaded to Blackboard if you want them marked.
        • Code you want marked MUST be in the commit hash you give us in the Blackboard
        quiz.
        We will not mark anything pushed to GitLab if we don’t have the commit hash we can
        identify from Blackboard. Submission time is recorded as the time the Blackboard quiz
        is submitted, not the time you last pushed to GitLab or the last time you saved the quiz.
        Code Submission Details
        You have the choice to complete the lab in C, Java or Python. Program stubs for this exercise exist
        for each language. Only one language solution will be marked.
        Because people have had a number of issues with GitLab over the years we take a multiple redundancy approach to submission of code. This involves both pushing a commit to GitLab and uploading
        a zip of your code to Blackboard. By preference we will mark the code you submitted to GitLab but
        1
        Figure 1: Identifying the hash of your most recent commit in GitLab
        Figure 2: Identifying the hashes of previous commits in GitLab
        if we can’t find it or it doesn’t check out properly then we will look in the zip file on Blackboard.
        Please do both to maximise the chance that one of them will work.
        When you submit the assignment through Blackboard you will be asked for the hash of the commit
        you want marked. This is to make sure the TAs can identify exactly which GitLab commit you want
        marked.
        You can find the hash of your most recent commit by looking in your repository on GitLab as
        shown in figure 1.
        You can also find the hash for a previous commit by clicking on the “commits” link and then
        identifying the commit you are interested in. This is shown in figure 2.
        Note that while the full hash for commits are quite long, we only need the first 8 characters (as
        shown in the screenshots) to identify for marking.
        For this assignment, you may use built-in libraries in your chosen language but bear in
        mind that we ask you to implement hash sets - simply using the Java HashSet class (for
        instance) will not get you the marks for implementation.
        Reminder: It is bad practice to include automatically generated files in source control (e.g. your git
        repositories). This applies to object files (C), class files (Java), and compiled bytecode files (Python).
        2
        It’s not fatal if you do this but it can cause issues during marking so try not to.
        While it is fine to discuss this coursework with your friends and compare notes, the work submitted
        should be your own. In particular this means you should not have copied any of the source code,
        or the report. We will be using the turnitin tool to compare reports for similarities.
        Generative AI You may use generative AI for this assignment. If you choose to do so you should
        read the guidance notes and watch the guidance video in the Blackboard folder for the lab. At the
        start of the quiz you are asked to make a statement about your use of generative AI. If you leave
        this blank it will be understood as a declaration that you have not used generative AI
        for the coursework. The statement should make the following things clear:
        1. Which generative AI tool you used.
        2. What parts of the coursework you used it for: if you used it for the code was it for all the code
        or only some of it? If just some, which bits? If you used it to help you answer quiz questions,
        which quiz questions? if you used it to help with the report, which parts of the report?
        3. What steps you took to ensure that the generative AI tool had provided correct information.
        4. What evidence there is either in the statement, or in your coursework submission, that demonstrates you have met the coursework learning outcomes.
        If you used generative AI you should also upload a zip file containing the transcripts of your usage in
        question 2 of the Blackboard quiz.
        3
        Learning Objectives
        By the end of this lab you will be able to:
        • Explain the role of the hash function in the implementation of hash tables and describe at least
        one hash collision strategy.
        • Explain the basic structure of a binary search tree and the basic operations on this structure.
        • Produce C, Java, or Python code to implement the above concepts.
        • Evaluate experimentally the behaviour of hash sets and binary search trees.
        Introduction
        The aim of this exercise is to use the context of a simple spell-checking program to explore the hash
        table and binary search tree data structures.
        This exercise builds on work in Labs 1 and 2. If you have not done these labs we recommend you
        take a look at them and their model solutions in order to familiarise yourself with the spell-checking
        program and our expectations for creating and writing up experiments.
        Getting the Support Code
        You should be able to see a lab3 folder in your Gitlab that contains all the support files for this lab.
        You can find instructions for accessing your gitlab on Blackboard.
        Data Structures
        The spell-checking program stores a dictionary of words using a Set datatype. There are two data
        structures used to implement this Set datatype in this lab. In Lab 1 we introduced this datatype
        but we repeat that information here. You may also need to look online (e.g. the Wikipedia page) to
        complete your knowledge.
        Set. This is an abstract datatype that is used to store the dictionary of words in our spell-checking
        program. The operations required for this application are:
        • find whether a given value (i.e. string, alphabetic word) appears in the set.
        • insert a given value in the set. Note that there is no notion of multiplicity in sets - a value
        either appears or it does not. Therefore, if insert is called with a duplicate value (i.e. it is
        already in the set) it can be ignored.
        There would usually also be a remove function but this is not required for this application. We also
        include two further utility functions, which we ask you to implement, that are useful for printing what
        is happening :
        • print set to list the contents of a Set.
        • print stats to output statistical information to show how well your code is working.
        You should have already used a dynamic array to implement this dataype in Lab 1.
        In this exercise we use hash tables and binary search trees to implement the Set datatype. These
        have been introduced in lectures and we describe these briefly below. You may want to look at the
        recommended textbooks or online to complete your knowledge.
        4
        Hash Table. The underlying memory structure for a hash table is an array. A hash function is
        then used to map from a large ‘key’ domain to the (much smaller) set of indices into the array, where
        a value can be stored. When two values in the input domain map to the same index in the array this
        is called a collision and there are multiple ways to resolve these collisions. To use a hash table to
        represent a set we make the key and value the same - the result is usually called a hash set.
        Binary Search Tree. You can think of a tree as a linked list where every node has two ‘children’.
        This allows us to differentiate between what happens in each sub-tree. In a binary search tree the
        general idea is that the ‘left’ sub-tree holds values smaller than that found in the current node, and
        the ‘right’ sub-tree holds larger values.
        Lab 3: Better Storage
        In Lab 1 we achieved a faster find function by first sorting the input. In this part we explore two data
        structures that organise the data in a way that should make it faster to perform the find operation.
        Part a: Hash Table Lookup
        Time Budget: Students generally find Part a of this lab more challenging than Part b. If you have
        not attempted Lab 1, then you should budget around 1 hour for familiarising yourself with the code
        stubs we provide, the spell-checking program, command line flags, test harness and set data-type
        implementation. You might want to look at the sample answers to Lab 1 to assist with this. You
        should then budget 2-3 hours for the implementation itself. If you have spent more than 4 hours
        on Part a, we recommend skipping ahead to Part b and only returning to Part a if you have time
        remaining.
        So far our solution to a fast find function has been sorting the input and in Part b we will look at
        storing it in a sorted order. In this part you will take a different approach that relies on a hash
        function to distribute the values to store into a fixed size array.
        In this part you need to edit the program stub in your chosen language for hash sets. You should:
        1. Complete the insert function for hash sets. What should you do if the value to be inserted
        already exists? Hint: this is representing a set.
        2. Complete the find function. Importantly, it should follow the same logic as insertion.
        3. Implement the print set function – This should print out a sensible representation of the hash
        set when called with the −vvv flag.
        4. Implement the print stats function – This should print out a sensible set of statistics. This
        should include the total number of collisions (including those incurred when re-inserting after a
        rehash), total number of rehashes and the average number of collisions per access. print stats is
        automatically called by the spell-checking program on completion. We will look at this output
        so do not change or disable it. This will require some extra fields in the class (Java, Python) or
        node struct (C).
        You can find a discussion of how to test your code in the section of these instructions on Testing (after
        Part b).
        The hash-value(s) needed for inserting a string into your hash-table should be derived from the
        string. For example, you can consider a simple summation key based on ASCII values of the individual characters, or a more sophisticated polynomial hash code, in which different letter positions are
        associated with different weights. (Warning: if your algorithm is too simple, you may find
        that your program is very slow when using a realistic dictionary and causes issues when
        5
        you perform your experimental evaluation.)
        We want you to implement an open addressing hashing strategy so that collisions are dealt
        with by using a collision resolution function. The simplest of these is linear probing, the most
        straightforward strategy but you could also choose to implement quadratic probing or double hashing. To understand what is going on you should add code to count the number of collisions
        so you can calculate the average per access in print stats. You are welcome to implement whatever
        open addressing strategy you like.
        The program stubs allow you to implement several collision strategies and hash functions which
        can be accessed by modes already given in the stubs. The implementation you want us to mark
        should run in mode 0. You may find this useful if you are attempting the extension exercise in the
        report. More details of these are contained in the language specific instructions. Do not change these
        since they are used in our testing processes. Write your code so that you can use the −m parameter,
        which sets the mode.
        An issue with open addressing is what to do when it is not possible to insert a value. To make things
        simple to begin with you can simply fail in such cases. Your code should keep the num entries field
        of the table up to date. Your insert function should check for a full table, and exit the program
        cleanly should this occur. Once this is working you should increase (double?) the hash table size
        when the table is getting full and then rehash into a larger table. Implement this resize and
        rehash functionality. When the program is called using the −vvv flag it should print out
        a message when it rehashes and then call the print set function to show the new state of
        the hash set once rehashing completes. In C, beware of memory leaks!1
        Mod of negative numbers:
        Most of you are likely to use the mod operator in your hash functions by doing something
        like
        h = x % size
        to try to get a value of h that is between 0 and size - 1.
        If x is negative then the behaviour of this operator varies by language: in particular in C and Java it will return a negative number. See the discussion at
        https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
        You might think that x cannot be negative. However, if you are adding or multiplying large
        numbers together (e.g., when hashing a string) then you are likely to get integer overflow.
        In hashing, we don’t care about such overflows as it is deterministic and just adds to the
        ‘chaos’ a bit but Java, for instance, may throw ArrayIndexOutOfBoundsException if a
        negative number is passed to the hash data structure, if you have not accounted for this
        possibility in your implementation.
        In C you could use an unsigned integer to make sure that you only have positive numbers.
        Once you have completed this implementation you should look at the Blackboard submission form
        where you will be asked the following questions about Part a.
        1. What did you implement as your hash function? How does it work? How “good” is it? (No
        more than 100 words).
        1One can also get memory leaks in memory managed languages but these are more semantic e.g. preserving a
        reference to something you will never use again.
        6
        2. How do find and insert for hash sets work? How have you implemented them? You do not need
        to discuss the details of your collision resolution strategy here (this is the next question) but
        should cover the basics of how a word is inserted into the dictionary and how a word is found
        (or not) in the dictionary. (No more than 100 words, not counting any code snippets included).
        3. What is your collision resolution strategy? How does it work? How have you implemented
        it? Illustrate your explanation with a sample of your code - this may be tided up a bit for
        presentation but should clearly be the same as the code you have submitted (No more than 200
        words, not counting the code snippet).
        4. Did you implement rehashing? If so briefly explain your implementation including explaining
        when you choose to rehash. (No more than 200 words, not counting any code snippet included).
        Part b: Binary Search Tree Lookup
        Time Budget: This part of the lab should take you about 2 hours to complete. If it is taking you
        more than 3 hours and you have completed a basic hash set implementation for Part a you might
        want to skip forward and do the first section of Part c (which only needs the hash set implementation)
        before returning to this. If you have not completed any of Part a it is worth persevering with this to
        get it working before moving on to Part c where you should focus on those sections relating to binary
        trees.
        In this part you need to edit the program stub in your chosen language for binary search trees.
        You should:
        1. Complete the insert function by comparing the value to insert with the existing value.
        2. Complete the find function. Importantly, it should follow the same logic as insertion.
        3. Implement the print set function – This should print out a sensible representation of the hash
        set when called with the −vvv flag.
        4. Implement the print stats function – This should print out a sensible set of statistics, including
        the height of the tree and average number of comparisons per insert and find. print stats is
        automatically called by the spell-checking program on completion. We will look at this output
        so do not change or disable it.
        Once you have completed this implementation you should look at the Blackboard submission form
        where you will be asked the following question about Part b.
        1. How do find and insert for binary trees work? How have you implemented them? Include your
        code for insert and find and relate your explanation to this code. (No more than 200 words, not
        including the code snippet(s))
        In this lab exercise we will stop there with trees. However, it is worth noting that this implementation
        is not optimal. What will happen if the input dictionary is already sorted? In the next lab we will be
        exploring self-balancing trees.
        Testing
        There are instructions in the individual language sections explaining how to test your algorithm on a
        single example and you should try that first. Once that is done you can test them on some test suites.
        We have provided a test script, test.py in the top level directory of the COMP26120 2023 repository and some data in the data directory. test.py takes up to five arguments: the first is the language
        you are using; the second is the test suite (simple, large and collision tests are provided but
        you can create more); the third is the data structure (bstree or hashset) you are testing; the fourth
        7
        (optional, defaults to 0) is the mode you want to test; and the fifth (optional) is the initial hash set
        size (only applicable for the hashset).
        For example, to run the simple tests for mode 2, your language is Java and you want to test your
        binary tree implementation, you should run
        ./python3 test.py java simple bstree 2
        You might want to edit the script to do different things but we will use the one provided when marking
        your code so make sure it runs with this. We have also provided a large test (replace simple by
        large above) which will take a lot longer to run (see help box below). You will need to unzip the
        files by running unzip henry.zip in the data/large directory. This should be particularly useful to
        check if you have implemented rehashing correctly.
        Finally we have provided some collision test tests. These are dictionaries containing exactly
        five items. If you run them with an initial hash set size of 5 it is highly likely that at least one insert
        will result in a collision and this will allow you to check if your collision resolution mechanism is
        working correctly. These can also be used to test if your rehashing implementation is working.
        For example, to run the collision tests for mode 0 with python as the language, you should run
        ./python3 test.py python collision_tests hashset 0 5
        This will run your hash set implementation on these tests with an initial hash set size of 5.
        In general, to be sure that you collision resolution strategy is working, we would recommend adding
        a message that gets printed out when run in verbose mode (e.g., with the -vv flag) when a collision
        occurs, possibly including printing out the set after the element is inserted, and then running tests
        manually using the instructions in the language specific sections to make sure at least one of your
        tests is encountering a collision and it is handled correctly when it does.
        Failing Tests:
        If the tests are failing there are two possible explanations. Either your implementation is
        wrong or the test script is being more picky than you thought.
        For the first reason, make sure you understand what you expect to happen. Create a
        small dictionary and input file yourself and test using these or use one of the individual
        simple tests. There are nine of these and you will find them organised in the data/simple
        directory where each sub-directory contains an example dictionary, input file and the
        answer expected by the test. Most of these are small and easy to understand. Remember
        that the program should print out all of the words that are in the input file but not in the
        dictionary. Make sure that your code is connected up properly in find so that it returns
        true/false correctly.
        For the second reason, make sure that you don’t print anything extra at verbosity level 0. If
        you look at test.py you’ll see that what it’s doing is comparing the output of your program
        between “Spellchecking:” and “Usage statistics:” against some expected output. It runs the
        program at verbosity level 0 and expects that only spelling errors are output in-between
        those two points.
        As part of marking, we will run your implementations on the simple and collision tests test
        suites and may also run them on the large test suite if we feel additional testing is required.
        Part c: Making a Comparison
        Time Budget: If you have not attempted Lab 2 then you should budget 1 hour to familiarise
        yourself with the concept of running experiments, generating data and so on. You should then expect
        8
        this part of the lab should take you 2-3 hours to complete. However, actually generating data and
        running experiments is likely to take more time than that, but you can leave your computer running
        and do something else while this is happening – particularly if you write scripts to automate data
        generation and run experiments. You should plan your time with this in mind. We recommend writing
        up the experiments as you go so you have something to submit if you run out of time.
        You should now perform an experimental evaluation like the one you did for Lab 2. We want to
        address the question.
        Under what conditions are your different implementations of the dictionary data structure
        preferable?
        Note that for hash sets the initial size of the data structure will potentially play a role. You may
        want to design your experiment so this is not a concern. You may also find it useful to correlate your
        findings with statistics such as the number of collisions in the hash table.
        We recommend you read, if you have not already done so, the Lab 2 instructions on how to generate
        inputs for your experiments. You may also (if possible) reuse the inputs you generated as part of that
        lab as part of your evaluation here in order to save time. Note that generation of inputs is likely to
        take some time (hours for large dictionaries and queries), so you should plan your work so you can
        set a generation script running and then leave it while you do something else.
        You should perform two experiments for your evaluation and then draw a conclusion that answers
        the question above. You can then devise and run a third optional (set of) experiments as an extension
        exercise.
        The two experiments you should perform should answer the questions:
        1. Does your implementation of insert for hash sets work as expected in the average case in terms
        of complexity.
        2. Does your implementation of insert for binary trees work as expected in the average case in
        terms of complexity.
        Important: In your hypothesis, experimental design and results you should be very careful to
        consider whether you are discussing or measuring a single insertion, or several insertions and you
        should be consistent around this throughout. Of course, you can measure several insertions and from
        that calculate the cost of a single insertion, but you should be clear about this.
        Your presentation of each experiment should have four sections – these are likely to be very similar
        for each experiment:
        Hypothesis You should state, as a hypothesis, what you expect the performance to be in big O
        notation and then write a short paragraph explaining why you believe this to be the case, based
        on the theoretical complexities of insert for the data structure. Be careful about how you present
        big O notation. In particular O is not a function so it doesn’t make sense to perform arithmetic
        with it such as
        O(n)
        n
        or
        O(n) × n
        (This section should be about 200 words and take up no more than half a page in your report).
        Experimental Design You should describe how you designed the experiment. This should include:
        • Your intended independent variable – this is the thing you are varying.
        • Your intended dependent variable – this is the thing you are measuring whose value your
        hypothesis says depends upon the value of the independent variable.
        9
        • Anything that you could vary but you are not – for instance to avoid confusing your results
        with other variables that might also influence performance.
        • What input data you generated and how many input files you used for each value of your
        independent variable.
        • What program you ran on what inputs. How many times you ran it on each input.
        • What you measured and how.
        You should also mention anything else you think is relevant that will help your marker judge
        how well you designed your experiment to test your hypothesis. (This section should be about
        500 words and take up no more than a page in your report – not counting any scripts included
        as appendices).
        Results You should present your results, ideally as a graph showing a best fit line. If you do any
        processing on results, such as generating best fit lines, computing averages, etc., then these
        should be described. Raw data should be presented in an appendix or included with your code
        submission. You should state clearly whether your hypothesis was confirmed or refuted (or it is
        equivocal or difficult to tell). If your hypothesis was refuted you should discuss why that might
        be (e.g., incorrect implementation of insert, some problem with the experimental design) and
        sketch how you might investigate further. (This section should be about 200 words and take up
        no more than half a page in your report – not counting graphs and tables).
        Note that it is more important to be able to clearly explain your experimental design (justifying
        your decisions) than it is to have excellent results. The answers to the wrong question are useless.
        Top marks can be obtained for Part c for a well-designed experiment that has disappointing results
        (either disproving a hypothesis or just being unclear in what they show), so long as the conclusions
        you draw make it clear you are aware of the issues with the results.
        Disappointing/Surprising Results:
        While you can get full marks for results that disprove your hypothesis, these often are
        indicative of issues with the implementation or, if you are using them, with any scripts
        written to automate the experiments. Therefore it is alway worthwhile double-checking these
        things since problems with these will effect your marks. The commonest implementation
        issue we see here is that find and/or insert are implemented in a way that returns the correct
        answer but does more work than necessary. The commonest issue with automation scripts
        is that they are not running the program at all, trying to run it on non-existent or empty
        input files, or causing it to crash somehow. If your results seem to show that your program
        takes the same amount of time on all inputs (particularly if that time is very short) then
        it is worth double-checking that your automation script is actually running the program
        correctly.
        Once you have written up your experiments you should write two further sections.
        Conclusion Use your results to address the initial question: when should you use your hash set
        implementation and when should you use your binary tree implementation? You do not need to
        validate this experimentally since for some implementations this is likely to take a lot of time,
        even with a good implementation on a fast machine. Theoretically, there is a crossover point
        where one becomes better than the other that you should be able to calculate from your existing
        results so even if you can’t run large enough experiments to find this point we expect to see the
        answer you have calculated.
        Data Statement It is good practice to inform readers where they can find your data and code in
        order to check your results and re-run experiments for themselves. This should appear in a Data
        statement.
        10
        We’ve asked you not to include large input files in your gitlab, but you should include any scripts
        you created - for instance to generate input files or run experiments, and the“raw” output data
        - e.g., a table (e.g., as a .csv file or spreadsheet) of independent variable values matched with
        dependent variable values. You may also include this as an appendix in the report.
        The data statement should clearly state where all the input data (if included), scripts and output
        data can be found.
        Hints
        • You should be able to do all of the analysis you need to do for this whole lab using a simple
        spreadsheet application. However the unix utility gnuplot and the python matplotlib library
        also both allow you to create plots from data and fit lines to those plots.
        • As the numbers are all small you might want to count n in lots of 10k e.g. for an input of 10,000
        say it’s n = 1, for 20,000 say it’s n = 2 etceteras. Alternatively, you may wish to measure time
        in milliseconds rather than seconds. This is just to make numbers look more sensible.
        • You can use the UNIX time utility to measure running time. This gives you times for real,
        user and sys. The most accurate way to judge the running time of your algorithm is to take
        the sum or user and sys (real includes the time when other resources might be using your
        CPU and can’t account for computation time across multiple cores).
        • You may want to plot/calculate average time per insert, rather than plotting the total time
        taken to build the dictionary and query it for words. When doing this (particularly for Java) be
        aware that there may be a fixed “overhead” time that is independent of the size of dictionary
        or query – if this is the case and you are calculating averages you may see that the average time
        appears to get better as the dictionary size increases not worse as you would expect. If this is
        happening, you may want to start by plotting a graph of total time and determining where this
        crosses the origin, in order to give you a guess at overhead, and then deduct this start up value
        from the total time before calculating the average. If you do this, make sure to document it in
        your report.
        • If you are convinced your implementation is correct but your results don’t seem to be as expected,
        particularly if you are looking at an overhead that is making time per insert hard to calculate,
        you might want to investigate correlations between other factors – e.g., the relationship between
        dictionary size and tree height for binary trees to see if that can illuminate what is happening.
        Report Extension. As an extension activity you can include a third (set of) experiment(s) in your
        report. This experiment can be to explore any aspect of the practical complexity of your implementation of hash sets or binary trees that you wish – good things to look at include: query time; the other
        collision resolution strategies for hash sets (if you have implemented them); other hash functions;
        and the effect of rehashing on the performance of hash sets, but there are many opportunities here.
        Note that you can get **% of the marks for this coursework without attempting the extension. This
        is for students who wish to stretch themselves and should not be attempted unless the rest of the
        coursework has been completed in good time.
        Instructions for C Solutions
        If you intend to provide a solution in C you should work in the lab3/c directory of the COMP26120 2023
        repository. All program stubs and other support files for C programs can be found in this directory.
        The completed programs will consist of several “.c” and “.h” files, combined together using make.
        You are given a number of complete and incomplete components to start with:
        11
        • global.h and global.c - define some global variables and functions, including the verbosity level.
        • speller.c - the driver for each spell-checking program, which:
        1. reads strings from a dictionary file and inserts them in your data-structure
        2. reads strings from a second text file and finds them in your data-structure
        - if a string is not found then it is assumed to be a spelling mistake and is reported to the
        user, together with the line number on which it occurred
        3. processes input flags to set the dictionary, query file, size, mode and verbosity level to be
        used.
        You should not need to change speller.c – if you do, make sure that the input flags for running
        the program continue to work as specified below.
        • set.h - defines the generic interface for the data-structure
        • hashset.h and hashset.c - includes a partial implementation for hash sets that you need to
        complete in Part a.
        • bstree.h and bstree.c - includes a partial implementation for binary search trees that you need
        to complete in Part b.
        Note: The code in speller.c that reads words treats any non-alphabetic character as a
        separator, so e.g. “non-alphabetic” would be read as the two words “non” and “alphabetic”.
        This is intended to extract words from any text for checking (is that “-” a hyphen or a
        subtraction?) so we must also do it for the dictionary to be consistent. This means that
        your code has to be able to deal with duplicates i.e. recognise and ignore them. For example,
        on my PC /usr/share/dict/words is intended to contain **9,829 words (1 per line) but is
        read as 526,065 words of which 418,666 are unique and 107,399 are duplicates.
        Running your code
        To test your implementation, you will need a sample dictionary and a sample text file with some text
        that contains the words from the sample dictionary (and perhaps also some spelling mistakes). You
        are given several such files in the data directory of the COMP26120 2023 repository, and you will
        probably need to create some more to help debug your code. These files will need to be copied to
        the directory where you are working or you will need to set your PATH so that your program can be
        executed from anywhere.
        You should also test your program using a larger dictionary. One example is the Linux dictionary
        that can be found in /usr/share/dict/words.
        Compile and link your code using make hashset (for part a), or make bstree (for part b). These will
        create executables called speller hashset and speller bstree respectively.
        When you run your spell-checker program, you can:
        • use the −d flag to specify a dictionary.
        • (for part a) use the −s flag to specify a hash table size: try a prime number greater than twice
        the size of the dictionary (e.g. 1,000,003 for /usr/share/dict/words). Do not hard code the
        dictionary size into your code. You should let it be set by the size argument to the
        initialize set function.
        12
        • use the −m flag to specify a particular mode of operation for your code (e.g. to use a particular
        hashing algorithm). You can access the mode setting by using the corresponding mode variable
        in the code you write. Note that the modes you should use have already been specified in
        hashset.h. You should not need to use this unless you want to investigate additional hash
        functions or addressing strategy.
        • use the −v flag to turn on diagnostic printing, or −vv for more printing (or −vvv etc. - the
        more ”v”s, the higher the value of the corresponding variable verbose). We suggest using this
        verbose value to control your own debugging output. You can access the verbosity level as the
        global verbose. This is 0 by default and will be set to 1, 2 or 3 by use of the −v, −vv and −vvv
        flags.
        So the general form of a call to your program should be
        speller hashset -d <dictionary> -s <size> -m <mode> <input-file>
        For instance to run your hash set program on the third of the simple test cases, in mode 1, with a
        medium level of verbosity, you would call:
        speller hashset -d ../data/simple/3/dict -m 1 -vv ../data/simple/3/infile
        NOTE IF CUTTING AND PASTING THIS that the underscores often get lost when you cut and
        paste.
        Instructions for Java Solutions
        If you intend to provide a solution in Java you should work in the lab3/java directory of the
        COMP26120 2023 repository. The completed programs will form a Java package called comp26120.
        You can find this as a sub-directory of the java directory. All program stubs and other support files
        for Java programs can be found in this directory but you will need to copy in either your solution or
        the model solutions for Lab 1 into this directory – this should be the file darray.java.
        You are given a number of complete and incomplete components to start with:
        • speller config.java - defines configuration options for the program, including the verbosity level.
        • speller.java - the driver for each spell-checking program, which:
        1. reads strings from a dictionary file and inserts them in your data-structure
        2. reads strings from a second text file and finds them in your data-structure
        - if a string is not found then it is assumed to be a spelling mistake and is reported to the
        user, together with the line number on which it occurred
        3. processes input flags to set the dictionary, query file, size, mode and verbosity level to be
        used.
        You should not need to change speller.java – if you do, make sure that the input flags for running
        the program continue to work as specified below.
        • set.java - defines the generic interface for the data-structure
        • set factory.java - defines a factory class to return the appropriate data structure to the program.
        • hashset.java -includes a partial implementation for hash sets that you need to complete in Part
        a.
        • speller hashset.java - sub-classes speller for use with hashset.
        13
        • bstree.java - includes a partial implementation for binary search trees that you need to complete
        in Part b. You will need to edit this.
        • speller bstree.java - sub-classes speller for use with bstree.
        Note: The code in speller.java that reads words treats any non-alphabetic character as a
        separator, so e.g. “non-alphabetic” would be read as the two words “non” and “alphabetic”.
        This is intended to extract words from any text for checking (is that “-” a hyphen or a
        subtraction?) so we must also do it for the dictionary to be consistent. This means that
        your code has to be able to deal with duplicates i.e. recognise and ignore them. For example,
        on my PC /usr/share/dict/words is intended to contain **9,829 words (1 per line) but is
        read as 526,065 words of which 418,666 are unique and 107,399 are duplicates.
        Running your code
        To test your implementation, you will need a sample dictionary and a sample text file with some text
        that contains the words from the sample dictionary (and perhaps also some spelling mistakes). You
        are given several such files in the data directory of the COMP26120 2023 repository, and you will
        probably need to create some more to help debug your code. These files will need to be copied to
        the java directory or you will need to set your CLASSP AT H so that your program can be executed
        from anywhere.
        You should also test your program using a larger dictionary. One example is the Linux dictionary
        that can be found in /usr/share/dict/words.
        Compile your code using javac *.java. This will create an executable called speller bstree.class
        (for Part a) and speller hashset.class (for Part a). To run your program you should call java
        comp26120.speller bstree sample-file and java comp26120.speller hashset sample-file respectively. Note that you will either need to set your CLASSP AT H to the java directory of the
        COMP26120 2023 repository or call java comp26120.speller bstree sample-file (resp. java
        comp26120.speller hashset sample-file) in that directory.
        When you run your spell-checker program, you can:
        • use the −d flag to specify a dictionary.
        • (for part a) use the −s flag to specify a hash table size: try a prime number greater than twice
        the size of the dictionary (e.g. 1,000,003 for /usr/share/dict/words). Do not hard code the
        dictionary size into your code. You should let it be set by config.init size field of
        the config object supplied as an argument to the hashset constructor.
        • use the −m flag to specify a particular mode of operation for your code (e.g. to use a particular
        sorting or hashing algorithm). You can access the mode setting by using the corresponding mode
        variable in the code you write. Note that the modes you should use have already been specified
        in hashset.java. You should not need to use this unless you want to investigate additional hash
        functions or addressing strategy.
        • use the −v flag to turn on diagnostic printing, or −vv for more printing (or −vvv etc. - the
        more ”v”s, the higher the value of the corresponding variable verbose). We suggest using this
        verbose value to control your own debugging output. We suggest using this verbose value to
        control your own debugging output. You can access the verbosity level as the verbose field of
        the hashset object. This is 0 by default and will be set to 1, 2 or 3 by use of the −v, −vv and
        −vvv flags.
        14
        So the general form of a call to your program should be
        java comp26120.speller hashset -d <dictionary> -s <size> -m <mode> -v <input-file>
        For instance to run your hash set program on the third of the simple test cases, in mode 1, with a
        medium level of verbosity, you would call:
        java comp26120.speller hashset -d ../data/simple/3/dict -m 1 -vv ../data/simple/3/infile
        NOTE IF CUTTING AND PASTING THIS that the underscores often get lost when you cut and
        paste.
        Instructions for Python Solutions
        If you intend to provide a solution in Python you should work in the lab3/python directory of the
        COMP26120 2023 repository. All program stubs and other support files for Python programs can be
        found in this directory but you will need to copy in either your solution or the model solutions for
        Lab 1 into this directory – this should be the file darray.py. You will need to use Python 3.
        You are given a number of complete and incomplete components to start with:
        • config.py - defines configuration options for the program, including the verbosity level.
        • speller.py - the driver for each spell-checking program, which:
        1. reads strings from a dictionary file and inserts them in your data-structure
        2. reads strings from a second text file and finds them in your data-structure
        - if a string is not found then it is assumed to be a spelling mistake and is reported to the
        user, together with the line number on which it occurred
        You should not need to change speller.py – if you do, make sure that the input flags for running
        the program continue to work as specified below.
        • set factory.py - defines a factory class to return the appropriate data structure to the program.
        • hashset.py - includes a partial implementation for binary search trees that you need to complete
        in Part a. You will need to edit this.
        • speller hashset.py - front end for use with hashset which then calls the functionality in speller.py.
        • bstree.py - includes a partial implementation for binary search trees that you need to complete
        in Part b. You will need to edit this.
        • speller bstree.py - front end for use with bstree which then calls the functionality in speller.py.
        Note: The code in speller.py that reads words treats any non-alphabetic character as a
        separator, so e.g. “non-alphabetic” would be read as the two words “non” and “alphabetic”.
        This is intended to extract words from any text for checking (is that “-” a hyphen or a
        subtraction?) so we must also do it for the dictionary to be consistent. This means that
        your code has to be able to deal with duplicates i.e. recognise and ignore them. For example,
        on my PC /usr/share/dict/words is intended to contain **9,829 words (1 per line) but is
        read as 526,065 words of which 418,666 are unique and 107,399 are duplicates.
        15
        Running your code
        Important: To run the provided program stubs in python2.7 you will need to install the enum34
        package. You can do this from the UNIX command line. We advise that you use Python 3 instead.
        To test your implementation, you will need a sample dictionary and a sample text file with some
        text that contains the words from the sample dictionary (and perhaps also some spelling mistakes).
        You are given several such files in the data directory of the COMP26120 2023 repository, and you
        will probably need to create some more to help debug your code. These files will need to be copied
        to the python directory or you will need to set your P Y T HONP AT H so that your program can be
        executed from anywhere.
        You should also test your program using a larger dictionary. One example is the Linux dictionary
        that can be found in /usr/share/dict/words.
        To run your program you should call python3 speller hashset.py sample-file (where sample −
        f ile is a sample input file for spell checking) for Part a and python3 speller bstree.py sample-file
        for Part b. Note that you will either need to set your P Y T HONP AT H to the python directory of
        the COMP26120 2023 repository or call python3 speller hashset.py sample-file (resp. python3
        speller bstree sample-file) in that directory.
        When you run your spell-checker program, you can:
        • use the −d flag to specify a dictionary.
        • (for part a) use the −s flag to specify a hash table size: try a prime number greater than twice
        the size of the dictionary (e.g. 1,000,003 for /usr/share/dict/words). Do not hard code the
        dictionary size into your code. You should use self.hash table size which is created
        from the configuration options when the hash set object is created by init .
        • use the −m flag to specify a particular mode of operation for your code (e.g. to use a particular
        sorting or hashing algorithm). You can access the mode setting by using the corresponding mode
        variable in the code you write. Note that the modes you should use have already been specified
        in hashset.py. You should not need to use this unless you want to investigate additional hash
        functions or addressing strategy.
        • use the −v flag to turn on diagnostic printing, or −vv for more printing (or −vvv etc. - the
        more ”v”s, the higher the value of the corresponding variable verbose). We suggest using this
        verbose value to control your own debugging output. You can access the verbosity level as the
        self.verbose field of the hashset object. This is 0 by default and will be set to 1, 2 or 3 by use
        of the −v, −vv and −vvv flags.
        So the general form of a call to your program should be
        python3 speller hashset.py -d <dictionary> -s <size> -m <mode> <input-file>
        For instance to run your hash set program on the third of the simple test cases, in mode 1, with a
        medium level of verbosity, you would call:
        python3 speller hashset.py -d ../data/simple/3/dict -m 1 -vv ../data/simple/3/infile
        NOTE IF CUTTING AND PASTING THIS that the underscores often get lost when you cut and
        paste.
        Marking Rubric
        This coursework is worth 20% of your final mark for COMP26120. This means each mark in this
        mark scheme is worth 0.5% of your final mark for the module.
        16
        In the rubric below, the marks that an item is worth are slightly approximate due to the way
        Blackboard allocates percentages across sections. In general unsatisfactory performance will get you
        10% of the available marks for a section, satisfactory will get you 50% and excellent will get you
        100%. This can vary for items worth more than 1 mark because several criteria may be involved each
        of which can be marked unsatisfactory, satisfactory or excellent.
        Note that on Blackboard, most of the execution marks are given for the Code Upload question
        since the TA will run a bunch of tests on your code in order to determine how well it executes – we
        separate out these marks here into the sections related to specific parts of the implementation.
        If you have used generative AI and have produced excellent quality answers in any section but we
        are not convinced by your statement that you have met the learning outcomes then your mark will
        be capped at satisfactory. Depending upon how many people are affected by this we may then hold
        vivas to see if the mark should be raised.
        請加QQ:99515681 或郵箱:99515681@qq.com   WX:codehelp

        掃一掃在手機打開當前頁
      1. 上一篇:MA2552編程代寫、代做MATLAB程序
      2. 下一篇:CSC345編程代寫、代做Python語言程序
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
        戴納斯帝壁掛爐全國售后服務電話24小時官網
        菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
        菲斯曼壁掛爐全國統一400售后維修服務電話2
        美的熱水器售后服務技術咨詢電話全國24小時客服熱線
        美的熱水器售后服務技術咨詢電話全國24小時
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
        合肥機場巴士2號線
        合肥機場巴士2號線
      4. 幣安app官網下載 短信驗證碼

        關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

        Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
        ICP備06013414號-3 公安備 42010502001045

        主站蜘蛛池模板: 无码中文人妻在线一区二区三区| 国产一区二区三区四| 亚洲一区二区三区乱码A| 亚洲爆乳精品无码一区二区| 日韩免费无码一区二区三区 | 精品国产一区二区三区四区 | 立川理惠在线播放一区| 亚洲AV无码一区二区三区系列| 国产在线一区二区三区在线| 人妻无码第一区二区三区| 亚洲AⅤ无码一区二区三区在线 | 亚洲熟妇AV一区二区三区浪潮 | 亚洲日本一区二区三区在线| 一区二区三区四区在线观看视频| 乱中年女人伦av一区二区| 亚洲av无码一区二区三区天堂古代| 成人精品一区二区三区不卡免费看| 无码人妻品一区二区三区精99 | 不卡无码人妻一区三区音频| 国产成人AV一区二区三区无码| 国产精品成人一区二区| 精品国产一区二区三区香蕉事| 国产一区二区在线观看app| 色综合久久一区二区三区| 久久精品黄AA片一区二区三区| 蜜桃无码AV一区二区| 久久久久无码国产精品一区| 人妻无码一区二区三区四区| 日本一区二区三区久久| 欧洲精品一区二区三区| 激情无码亚洲一区二区三区| 亚洲综合在线一区二区三区| 福利国产微拍广场一区视频在线| 精品国产免费观看一区| 色一情一乱一伦一区二区三欧美 | 在线中文字幕一区| 福利国产微拍广场一区视频在线| 在线一区二区观看| 欧亚精品一区三区免费| 婷婷亚洲综合一区二区| 精品国产亚洲一区二区在线观看|