Clojure Euler: Problem 010

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.


We have already worked with prime numbers in Problem 003 and Problem 007. We also decided that the best way to work with prime numbers in clojure to use primes lazy-seq from clojure.contrib.lazy-seqs.

Gentle reminder: Just add proper :use to your file:

(:use [clojure.contrib.lazy-seqs :only (primes)])

Then, the simplest problem solution to use take-while stream and reduce for sum:

(reduce + (take-while #(< % 2000000) primes))

Again, one-line solution. Let's run it.

Finding result on my nachine takes ~12 seconds. Small enough, but maybe somehow it can be improved?

Sieve of Eratosthenes

Sieve of Eratosthenes - ancient algorithm for finding prime numbers. The best explanation how it works with picture from wiki:

Our first naive implementation:

(defn sieve-1 []
  (loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
    (if (> n 2000000) (reduce + nums)
        (recur (clojure.set/difference nums (set (range (* n n) 2000000 n))) (inc n)))))

Note: clojure.set/difference function find the difference between two sets.

This recursive implementation is even worse than our oneliner, it calculates the result in ~24 seconds.

Primary optimisation is to use (* n n) as recursion base instead of n. Because of there are no prime numbers if we crossed greater than square root of maximum. Problem evolves.

(defn sieve-2 []
  (loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
    (if (> (* n n) 2000000) (reduce + nums)
        (recur (clojure.set/difference nums (set (range (* n n) 2000000 n))) (inc n)))))

~21 seconds. Still worse.

Why we increment by 1 on recursion call? If we starting our n value from 3 we don't need to go over even numbers.

(defn sieve-3 []
  (loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
    (if (> (* n n) 2000000) (reduce + nums)
        (recur (clojure.set/difference nums (set (range (* n n) 2000000 n))) (+ n 2)))))

~13 seconds. Much better... than previous, still worse than first version.

If you looked at difference documentation, take a look at its implementation under "Source". Nothing prevent us to pass sequence as second argument instead of set. Just removing casting to set.

(defn sieve-4 []
  (loop [nums (set (cons 2 (range 3 2000000 2))) n 3]
    (if (> (* n n) 2000000) (reduce + nums)
        (recur (clojure.set/difference nums (range (* n n) 2000000 n)) (+ n 2)))))

~6 seconds. Wow! It's better than original solution. That means our efforts were not wasteful.

Unfortuantely, we stopped improve our function. But if you want more, other major improvements can be done in the following areas:

There are much more optimisations can be done to improve prime numbers performace. If you interested, read nice Christophe Grand's post Everybody Loves The Sieve Of Erathosthenes

(str "Git" "Hub")

P.S Honestly, I am OK with 12 seconds. But 6 seconds is better. We performed improvements just to show the point if you want to improve something, probably, you can do it.

mishadoff 15 December 2012
blog comments powered by Disqus