Search Performance

# Search Performance In this assignment, you will implement a few search algorithms, including linear search and binary search, and evaluate their performance. The goal is to introduce you with the basic concepts of algorithmic complexity. Note that it is trivial to find working code for both search algorithms online. Obviously you should not be searching for that code in doing the assignment. However, if you __accidentally__ run across code, your multiline comment at the top of your python file must indicate which website(s) you were on when you saw that code and also how you ended up on that web site (what the search term was, what path of links you followed, etc). | Part | Section | |---------------|-----------------------------------------------| | 1 (in-lab) | [Part 1: Search Algorithms](#part1) | | 1 (in-lab) | [Check-in Instructions](#checkin) | | 2 (lab/home) | [Part 2: Comparing the functions](#part2) | | 2 (lab/home) | [Submission Instructions](#submission) | ## Getting started Create a new project named `SearchPerformance` in the `CSCI051p-Workspace` you created on your Desktop. *Double check that you are creating the project in the right place, or you will likely have trouble finding your files later.* Then download the [starter code](searchperf.zip), and unzip the files. You hould see a folder named `starter` that contains two files (`search_performance.py` and `search_performance_tester.py`). Copy both of these files into the (recently created) `SearchPerformance` folder and refresh the view in PyCharm. **Note:** You might need to manually install the matplotlib package to generate the comparison charts using `plot_comparison` in Part 2 of this assignment. You can do so following the same method as you used to [install the Arcade package](../A3/install.html). <a name="part1"></a> ## Part 1: Search Algorithms Start by implementing the two search functions: `linear_search` and `binary_search`. Both functions take two arguments. The first argument is `alist`, which is a list of `int`. The second argument is `value`, which is an `int`. Both functions return a non-negative integer `index` such that `alist[index] = value`. If no such integer `index` exists, the functions return `-1`. #### Function 1: linear_search(alist, value) This function goes through the elements of alist from first to last, checking each one to see if it equals `value`. If so, it returns the index. If every element in the list has been checked and nothing equaling value has been found, the function returns `-1`. #### Function 2: binary_search(alist, value) The binary search algorithm, and therefore this function, requires `alist` to already be sorted (make sure you understand why!). If `alist` has length `n` then the function first checks the element in position `n/2` (rounding if necessary) to see if it equals `value`, is greater than `value`, or is less than `value`. If it is equal, then it returns the index immediately. If the element is greater than `value`, then the function repeats the search on the first half of the list. If the element is less than `value`, then the function repeats the search on the second half of the list. If there is no way to further divide the list and the element value has not been found, then the function returns `-1`. Note that binary search can be implemented either recursively or iteratively. - The iterative version uses a `while loop` and uses two variables for keeping track of the endpoints of the portion of the list in which you're currently searching for value. - The recursive version is best implemented using a helper function `binary_search_helper(alist,value, start, end)`, which searches for value but only for indices in the range from `start` to `end-1` (inclusive). If you do this, the `binary_search` function will contain the single line of code: ``` binary_search_helper(alist, value, 0, len(alist)) ``` Either version is fine, but make sure you think through your approach before you start to code! #### Test cases: linear_search() and binary_search() The `search_performance_tester.py` in the starter includes a few simple tests to ensure that your functions work at all. You should extend those test cases to thoroughly test your functions. #### Function 3: list_of_integers(size) To do more thorough testing of Part 1, and to generate data for Part 2, you will need a function that generates a list of random integers with a specified size. The random integers should be in the range from `1` to `2*size`. If you want to do a search for a number that is guaranteed not to be in the list, you can search for `0` or `2*size + 1`. <a name="checkin"></a> #### Checking In Once you have finished implementing your two search functions, demo your functions to an instructor or TA to collect your lab points. Before finding a TA or professor, make sure your functions have: - appropriate docstrings - good algorithm comments - mnemonic variable names - good use of horizontal and vertical white space We will double check your code, ask you a few questions about it, answer any questions you have, and award your points for Part 1. This must be completed before leaving the lab. After that you should start working on Part 2. <a name="part2"></a> ## Part 2: Comparing the functions You'll do two experiments comparing the running times of the two functions. To get the running time of a function, you need to import the `time` package: ``` import time ``` Now, if you want to time a function called func which takes no arguments, you do so as follows: ``` start = time.time() func() end = time.time() elapsed_time_in_seconds = end-start ``` Note that if something runs fast, the running time might be reported using scientific notation (e.g., `2.1457672119140625e-06`, which is equal to `2.1457672119140625 * 10^-6 = .0000021457672119140625`). If something runs very fast, the running time might be reported as `0.0`, since there's limited precision in the timer. #### sorted_comparison(min_size, max_size) Create a series of lists, with starting with `min_size` and doubling the list size until it reaches `max_size`. For each list time a `linear_search` and a `binary_search` for a number that is not in the list. As you know, binary search requires the list to already be sorted. So, for this first part you'll create lists of different lengths that are already in sorted order. You can create that list however you want: - create a list of non-random numbers that is already sorted - use the `list_of_integers` method, and then use `list.sort()` to sort them You'll always search for an integer that is __not__ in the list, because that's the worst case (why?). `sorted_comparision` should return a list of three-element tuples. The first element of each tuple should be a list size. The second element of the tuple should be be the running time of a linear search (for something that does not exist) on a sorted list of that size. The third element of the tuple should be the running time of a binary search (for something that does not exist) on a sorted list of that size. For example, when I ran `sorted_comparison(2,8)` it returned the list ``` [(2, 2.1457672119140625e-06, 9.5367431640625e-07), (4, 9.5367431640625e-07, 2.1457672119140625e-06), (8, 1.1920928955078125e-06, 9.5367431640625e-07)] ``` #### unsorted_comparison(min_size, max_size) Create a series of lists, with starting with `min_size` and doubling the list size until it reaches `max_size`. For each list time a `linear_search` and a sort followed by a `binary_search` for a number that is not in the list. You also know, however, that lists aren't always sorted. So, for this second part you'll create lists of different lengths that contain random numbers. You can create that list however you want. Again, you'll always search for an integer that is not in the list. `unsorted_comparison` should return a list of three-element tuples. The first element of each tuple should be a list size. The second element of the tuple should be be the running time of a linear search (for something that does not exist) on a sorted list of that size. The third element of the tuple should be the total running time to: - first sort a list of that size, and then - run a binary search (for something that does not exist) on the sorted list. The list sizes should start at `min_size` and double each time until it reaches `max_size`. For example, when I ran `unsorted_comparison(2,8)` it returned the list: ``` [(2, 2.1457672119140625e-06, 2.86102294922e-06), (4, 1.1920928955078125e-06, 1.66893005371e-06), (8, 2.1457672119140625e-06, 1.66893005371e-06)] ``` #### main() In your main function, you should generate two charts: one that compares the running time of linear and binary search on sorted lists with lengths in the range 2 to 1048576 and one that compares the running time of linear and binary search on unsorted lists with lengths in the range 2 to 1048576. We have provided a helper function `plot_comparison` that generates these charts. #### Coding Style Make sure that your program is properly commented: * You should have comments at the very beginning of the file stating your name, course, assignment number and the date. * Each function should have an appropriate docstring, describing: - the purpose of the function - the types and meanings of each parameter - the type and meaning of the return value(s) * Include other comments as necessary to make your code clear In addition, make sure that you have used good style. This includes: * Following naming conventions, e.g. all variables and functions should be lowercase. * Using good (mnemonic) variable names. * Proper use of whitespace, including indenting and use of blank lines to separate chunks of code that belong together. For more detailed descriptions, please review the [Python Coding Style Guidelines](python_style.html). ## Part 3: Feedback Create a file named `feedback.txt` that answers the usual questions: 1. How long did you spend on this assignment? 2. Any comments or feedback? Things you found interesting? Things you found challenging? Things you found boring? <a name="submission"></a> ## Submission You will turn in four files for this assignment: * `search_performance.py` which contains your code * `search_performance_tester.py` which contains your tests cases * `search_performance.pdf` which contains graphs and summaries of your results. The pdf file should include: - the graph generated by `plot_comparison` when given the results of `sorted_comparison` as the first argument. - the graph generated by `plot_comparison` when given the results of `unsorted_comparison` as the first argument. - a comparison between your experimental results and a big-O analysis of the algorithms you implemented. - a few sentences, grounded in the data you collected, addressing how you would advise someone choosing between using your linear search and your binary search codes. * `feedback.txt`. These should be submitted using [submit.cs.pomona.edu](http://submit.cs.pomona.edu) as described in the general [submission instructions](submit.html). Note that we reserve the right to give you no more than half credit if your files are named incorrectly and/or your function headers do not match the specifications (including names, parameter order, etc). Please double and triple check this before submitting! ## Grade Point Allocations | Part | Feature | Value | |-----------|-------------------------------------------|-----| | Lab | Check-in | 3 | | | | | | Execution | `linear_search` | 5 | | Execution | `binary_search` | 5 | | Execution | `sorted_comparison` | 5 | | Execution | `unsorted_comparison` | 5 | | Execution | `main` | 2 | | Execution | writeup | 4 | | | | | | Testing | Thoroughly tests `linear_search` | 2 | | Testing | Thoroughly tests `binary_search` | 2 | | | | | | Style | Docstrings accurate, relevant | 2 | | Style | Comments in code accurate, relevant | 2 | | Style | Good use of loops and conditionals | 2 | | Style | Good use of variable names and whitespaces| 2 | | Style | Misc | 2 | | | | | | Feedback | Completed feedback file submitted | 2 |