Beginner Exercism • python

Sieve

Challenge Overview

Use the Sieve of Eratosthenes to find all the primes from 2 up to a given number.

Introduction

You bought a big box of random computer parts at a garage sale. You’ve started putting the parts together to build custom computers.

You want to test the performance of different combinations of parts, and decide to create your own benchmarking program to see how your computers compare. You choose the famous “Sieve of Eratosthenes” algorithm, an ancient algorithm, but one that should push your computers to the limits.

Instructions

Your task is to create a program that implements the Sieve of Eratosthenes algorithm to find all prime numbers less than or equal to a given number.

A prime number is a number larger than 1 that is only divisible by 1 and itself. For example, 2, 3, 5, 7, 11, and 13 are prime numbers. By contrast, 6 is not a prime number as it not only divisible by 1 and itself, but also by 2 and 3.

To use the Sieve of Eratosthenes, first, write out all the numbers from 2 up to and including your given number. Then, follow these steps:

Find the next unmarked number (skipping over marked numbers). This is a prime number.
Mark all the multiples of that prime number as not prime.

Repeat the steps until you’ve gone through every number. At the end, all the unmarked numbers are prime.

The Sieve of Eratosthenes marks off multiples of each prime using addition (repeatedly adding the prime) or multiplication (directly computing its multiples), rather than checking each number for divisibility.

The tests don't check that you've implemented the algorithm, only that you've come up with the correct primes.

Example

Let’s say you’re finding the primes less than or equal to 10.

Write out 2, 3, 4, 5, 6, 7, 8, 9, 10, leaving them all unmarked.
```
2 3 4 5 6 7 8 9 10
```
2 is unmarked and is therefore a prime. Mark 4, 6, 8 and 10 as “not prime”.
```
2 3 [4] 5 [6] 7 [8] 9 [10]
↑
```
3 is unmarked and is therefore a prime. Mark 6 and 9 as not prime (marking 6 is optional - as it’s already been marked).
```
2 3 [4] 5 [6] 7 [8] [9] [10]
  ↑
```
4 is marked as “not prime”, so we skip over it.
```
2 3 [4] 5 [6] 7 [8] [9] [10]
     ↑
```
5 is unmarked and is therefore a prime. Mark 10 as not prime (optional - as it’s already been marked).
```
2 3 [4] 5 [6] 7 [8] [9] [10]
        ↑
```

6 is marked as “not prime”, so we skip over it.

2 3 [4] 5 [6] 7 [8] [9] [10]
           ↑

7 is unmarked and is therefore a prime.

2 3 [4] 5 [6] 7 [8] [9] [10]
              ↑

8 is marked as “not prime”, so we skip over it.

2 3 [4] 5 [6] 7 [8] [9] [10]
                 ↑

9 is marked as “not prime”, so we skip over it.

2 3 [4] 5 [6] 7 [8] [9] [10]
                     ↑

10 is marked as “not prime”, so we stop as there are no more numbers to check.

2 3 [4] 5 [6] 7 [8] [9] [10]
                         ↑

You’ve examined all the numbers and found that 2, 3, 5, and 7 are still unmarked, meaning they’re the primes less than or equal to 10.

Dig Deeper

Nested Loops

def primes(number):
    not_prime = []
    prime = []
    
    for item in range(2, number+1):
        if item not in not_prime:
            prime.append(item) 
            for element in range (item*item, number+1, item):
                not_prime.append(element)
    
    return prime

This is the type of code that many people might write as a first attempt.

It is very readable and passes the tests.

The clear disadvantage is that run time is quadratic in the input size: O(n**2), so this approach scales poorly to large input values.

Part of the problem is the line if item not in not_prime, where not-prime is a list that may be long and unsorted.

This operation requires searching the entire list, so run time is linear in list length: not ideal within a loop repeated many times.

def primes(number):
    number += 1
    prime = [True for item in range(number)]
    for index in range(2, number):
        if not prime[index]:
            continue
        for candidate in range(2 * index, number, index):
            prime[candidate] = False
    return [index for index, value in enumerate(prime) if index > 1 and value]

At first sight, this second example looks quite similar to the first.

However, on testing it performs much better, scaling linearly with number rather than quadratically.

A key difference is that list entries are tested by index: if not prime[index].

This is a constant-time operation independent of the list length.

Relatively few programmers would have predicted such a major difference just by looking at the code, so if performance matters we should always test, not guess.

Set Operations

def primes(number):
    not_prime = set()
    primes = []

    for num in range(2, number+1):
        if num not in not_prime:
            primes.append(num)
            not_prime.update(range(num*num, number+1, num))

    return primes

This is the fastest method so far tested, at all input sizes.

With only a single loop, performance scales linearly: O(n).

A key step is the set update().

Less commonly seen than add(), which takes single element, update() takes any iterator of hashable values as its parameter and efficiently adds all the elements in a single operation.

In this case, the iterator is a range resolving to all multiples, up to the limit, of the prime we just found.

Primes are collected in a list, in ascending order, so there is no need for a separate sort operation at the end.

def primes(number):
    numbers = set(item for item in range(2, number+1))
    
    not_prime = set(not_prime for item in range(2, number+1)
                    for not_prime in range(item**2, number+1, item))
    
    return  sorted(list((numbers - not_prime)))

After a set comprehension in place of an explicit loop, the second example uses set-subtraction as a key feature in the return statement.

The resulting set needs to be converted to a list then sorted, which adds some overhead, scaling as O(n log n).

In performance testing, this code is about 4x slower than the first example, but still scales as O(n).

def primes(number: int) -> list[int]:
    start = set(range(2, number + 1))
    return sorted(start - {m for n in start for m in range(2 * n, number + 1, n)})

The third example is quite similar to the second, just moving the comprehension into the return statement.

Performance is very similar between examples 2 and 3 at all input values.

Sets: strengths and weaknesses

Sets offer two main benefits which can be useful in this exercise:

Entries are guaranteed to be unique.
Determining whether the set contains a given value is a fast, constant-time operation.

Less positively:

The exercise specification requires a list to be returned, which may involve a conversion.
Sets have no guaranteed ordering, so two of the above examples incur the time penalty of sorting a list at the end.

Comprehensions

def primes(number):
    prime = (item for item in range(2, number+1) 
              if item not in (not_prime for item in range(2, number+1) 
              for not_prime in range(item*item, number+1, item)))
    return list(prime)

Many of the solutions to Sieve use comprehensions or generator-expressions at some point, but this page is about examples that put almost everything into a single, elaborate generator-expression or comprehension.

The above example uses a generator-expression to do all the calculation.

There are at least two problems with this:

Readability is poor.
Performance is exceptionally bad, making this the slowest solution tested, for all input sizes.

Notice the many for clauses in the generator.

This makes the code similar to nested loops, and run time scales quadratically with the size of number. In fact, when this code is compiled, it compiles to nested loops that have the additional overhead of generator setup and tracking.

def primes(limit):
    return [number for number in range(2, limit + 1)
            if all(number % divisor != 0 for divisor in range(2, number))]

This second example using a list-comprehension with all() is certainly concise and relatively readable, but it uses % (which the instructions ask you not to use) and the performance is again quite poor.

This is not quite a fully nested loop (there is a short-circuit when all() evaluates to False), but it is by no means “performant”. In this case, scaling with input size is intermediate between linear and quadratic, so not quite as bad as the first example.

Source: Exercism python/sieve