From 6.006 Wiki
We have a couple suggestions on things that have been causing students frustration on Problem 1(d):
range vs. xrange
On newer version of Python, it doesn't matter as much, but we suggest you try out xrange instead of range if you're having problems with your running time. The difference is that for i in range(n) produces a list [0,1,2,...,n-1] and then iterates through it, while for i in xrange(n) iterates more efficiently, essentially just keeping a counter and incrementing it on each iteration.
As part of your rolling hash for Problem 1(d), you might be performing large exponentiations, and then taking the result modulo some smaller number. Beware of doing it like this:
>>> (b**e) % m
The b**e will be some ginormous number, and so the modulus operation will be slow. Instead, use Python's built-in pow function:
>>> pow(b, e, m)
(Note: NOT math.pow! They are different functions.) It uses a more efficient modular exponentiation operation, taking the mods of intermediate results rather than just doing it on the final product. (If you're interested, look up the 'square and multiply' method for modular exponentiation.)
Hopefully this helps you keep your running times for Part 1(d) reasonable. In office hours, we've seen it cut running times by a factor of 2.
Part B Question 2b
- Simplify before taking the limit
- Since we know what the final form is (right?), can we deduce what terms need to cancel out and what terms should remain in the numerator/denominator?
Part B Question 2d
- Make sure you understand what the solutions to the previous parts mean. It may give you a clue how to start this part.
- What does f(n) = theta(g(n)) mean?
- Since we know what the final answer should be, can we use that in our proof?
Staff clarifications and Errata
Saturday, October 4: Problem 1 testing
When you submit your code for Problem Set 2-A, the server will run the test script that we have given you, with slight modifications:
For Problem 1(c), only the 100-, 1000-, and 10000-character tests will be run. The 100000-character test is likely to take too long, run out of memory, or both, so don't worry if your code doesn't pass it on your machine. The server will enforce a 15-second time limit for this part. (The staff solution runs in less than 1 second.)
For Problem 1(d), all four tests will be run. The server will enforce a 180-second time limit for this part. (The staff solution runs in less than 30 seconds.)
If your code runs over time, the server will only say "runtime error", and the output of the program might contain the word "Terminated". It's kind of a cryptic error message--sorry about that. We're working to improve that.
Effective Saturday evening, if your code runs over time, the server will kill it and give you the message "time limit exceeded".
Sunday, October 5: Part 1(f) (optional)
The LaTeX template suggests that you can submit your substring5.py code online, but there is no way to do that. We don't want you to upload your code -- just describe your algorithm.
Wednesday, October 8: 2B1 (max-gap BST)
The max-gap and insert operations should take sub-linear time and you will only receive credit for those implementations (you can get linear time WITHOUT augmenting the data structure). It is possible to get a constant max-gap time while maintaining a logarithmic insert time, if it helps you think about a possible solution.
general debugging hints
- adding asserts to your code can help you to ensure that values are what you expect them to be at all times
- write your code modularly: use submethods for any non-trivial repeated or similar code.
- write unit tests for each of the submethods written
- separate complicated computations into smaller pieces which can then be asserted on.
- try to find out where the time is going, are some methods being called more often than you expect them to? do they take longer to execute than expected?