
Lucky Numbers and Science
1383 words. Time to Read: About 13 minutes.Earlier this week, Heiko Dudzus posted this challenge post on Dev.to. The challenge was this:
Write a program or script in your language of choice, that prints the lucky numbers between 1 and n. Try to make it as fast as possible for sieving lucky numbers between 1 and a million. (Perhaps it is sensible to measure the time without printing the results.)
A lucky number is defined as a number that survives the Sieve of Josephus. I’m going to quote the original post here so you get the idea.
The 1 is lucky by definition.
The successor of 1 is 2. So, every second number gets eliminated: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, …
The next number is 3. Now, every third number gets eliminated: 1, 3, 7, 9, 13, 15, 19, 21, …
Number 3 is lucky! It’s successor is 7. Now, every seventh number gets eliminated. And so on.
See the link above for more details. I advise you to give the challenge a try if you haven’t yet before you read the rest of the post, since I’m about to tell you about my solutions.
I came up with two solutions, but the best solution ended up being an interesting combination of them both. You’ll see what I mean.
Solution 1: Naively Run the Algorithm
I like the quote “Make it work, make it right, make it fast”, which, as far as I can tell, this is attributed to Kent Beck. Therefore, my first solution focused on coding things exactly like the problem was provided, the easy way, whether or not this was the fastest way.
I was all proud of myself as I ran it. And then I ran it with max = 1_000_000
. And I sat. And I sat…
This is taking too long! I could to do better.
Solution 2: Focus on the Keeps, not the Rejects
As I was experimenting and testing things out, I noticed that, at the higher values of step_size
, the naive algorithm was only rejecting a number every so often. So I thought to myself, “What if we could take the step size out of the equation and make each step take roughly the same amount of time, regardless of how many elements there were in it?” So, instead of checking each item to see if it was a reject, I calculated out the size of the chunk of non-rejected numbers and copied those into a results
array, ignoring the rejects instead of explicitly rejecting them.
This one worked great! Even on my little laptop, it ran relatively quickly (don’t worry, the benchmarks are coming). But something felt weird. My programmer senses were tingling.
I had no proof at this point, but it seemed like all of the setup and creation of a second array each “round” should cause this algorithm to perform poorly at small step_size
s. It was at this point that this ceased to be a code challenge and became a science experiment.
Solution 3: A Combination of the Two
HYPOTHESIS: There is a performance trade-off between the low and high end for my two solutions. Thus, there should be a combination of the two that is faster than either one as a stand-alone.
If there was a trade-off somewhere between the two solutions, I would be able to run a few tests and find the best possible combination. I quickly updated my lucky_numbers
method.
And now to collect some data! For sanity’s sake, I wrapped each function in a module to differentiate them.
And no science experiment is complete without plots! Unfortunately, the data visualization story (outside of the web) is not as cushy in Ruby as it is in Python. Fortunately, I found a wrapper for Python’s matplotlib
so I could stay in Ruby land for now. Note that I used a log-based scale on the X-axis to better display my results.
This is the part that is confusing to me. I’ve run the experiment multiple times and I always end up with a dip at around 1000. We’ll talk about that in a moment. Here are the benchmarks of each solution directly compared.
And the results:
▶ ruby lucky_number_timer.rb
user system total real
Version 1 794.304851 28.946642 823.251493 (826.377494)
Version 2 23.267372 33.858966 57.126338 ( 57.238392)
Version 3 19.517162 20.816684 40.333846 ( 40.470201)
Conclusion
CONCLUSION: My hypothesis was supported. A combination of the two solutions performs better than either one solution by itself. However, the reason behind this is probably not the one that I initially thought. Some further research is required to find out why this worked.
My wife teaches middle school science and I just got done helping her judge their science fair, so I’ve got experimentation on the brain. Hopefully this little experiment makes her proud!
I’m sure my solution wasn’t the best possible solution, but it gets the job done. Ruby isn’t necessarily the language of choice for high-throughput calculations (although it’s not necessarily a bad choice either). If anybody can come up with a convincing reason for why my data came out the way it did, including the dip at 1000, I’ll personally send you a shiny new trophy emoji. 🏆 Also, if you’ve got a solution that you like better than these, I want to see it. Share it! (or share a link to your solution commented on Heiko’s original post).
Author: Ryan Palo | Tags: ruby puzzle performance |Like my stuff? Have questions or feedback for me? Want to mentor me or get my help with something? Get in touch! To stay updated, subscribe via RSS