The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.
Permalink: http://projecteuler.net/problem=10
We have already worked with prime numbers in
Problem 003 and
Problem 007. We also decided that
the best way to work with prime numbers in clojure to use primes
lazy-seq
from clojure.contrib.lazy-seqs
.
Gentle reminder: Just add proper :use
to your file:
(:use [clojure.contrib.lazy-seqs :only (primes)])
Then, the simplest problem solution to use take-while
stream and reduce
for sum:
(reduce + (take-while #(< % 2000000) primes))
Again, one-line solution. Let’s run it.
Finding result on my nachine takes ~12 seconds. Small enough, but maybe somehow it can be improved?
Sieve of Eratosthenes - ancient algorithm for finding prime numbers. The best explanation how it works with picture from wiki:
Our first naive implementation:
(defn sieve-1 []
(loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
(if (> n 2000000) (reduce + nums)
(recur (clojure.set/difference nums (set (range (* n n) 2000000 n))) (inc n)))))
Note: clojure.set/difference function find the difference between two sets.
This recursive implementation is even worse than our oneliner, it calculates the result in ~24 seconds.
Primary optimisation is to use (* n n)
as recursion base instead of n
. Because of there are no prime
numbers if we crossed greater than square root of maximum. Problem evolves.
(defn sieve-2 []
(loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
(if (> (* n n) 2000000) (reduce + nums)
(recur (clojure.set/difference nums (set (range (* n n) 2000000 n))) (inc n)))))
~21 seconds. Still worse.
Why we increment by 1 on recursion call? If we starting our n
value from 3
we don’t need
to go over even numbers.
(defn sieve-3 []
(loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
(if (> (* n n) 2000000) (reduce + nums)
(recur (clojure.set/difference nums (set (range (* n n) 2000000 n))) (+ n 2)))))
~13 seconds. Much better… than previous, still worse than first version.
If you looked at difference
documentation, take a look at its implementation under “Source”.
Nothing prevent us to pass sequence as second argument instead of set. Just removing casting to set.
(defn sieve-4 []
(loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
(if (> (* n n) 2000000) (reduce + nums)
(recur (clojure.set/difference nums (range (* n n) 2000000 n)) (+ n 2)))))
~6 seconds. Wow! It’s better than original solution. That means our efforts were not wasteful.
Unfortuantely, we stopped improve our function. But if you want more, other major improvements can be done in the following areas:
primes
from clojure.contrib.lazy-seqs
.(inc n)
to (+ n 2)
, but probably there is smarter solution.(reduce +)
because it is linear algorithm. We init nums
sequence with some arithmetic progression.
Its sum can be calculated in O(1) by formula. When we compose another sequence (second argument for difference
)
its also an arithmetic progression and sum can be calculated in O(1). When doing differencem just subtract second sum
from first sum, and it will be current nums
sum.There are much more optimisations can be done to improve prime numbers performace. If you interested, read nice Christophe Grand’s post Everybody Loves The Sieve Of Erathosthenes
P.S Honestly, I am OK with 12 seconds. But 6 seconds is better. We performed improvements just to show the point if you want to improve something, probably, you can do it.
mishadoff 15 December 2012