Sieving primes
TL;DR
Sieving primes is a simple task with well-understood solutions. We use it for a close study of vector-programming techniques in q, with particular attention to the iteration operators. We start with purely functional solutions.1
Finding prime numbers is a compute-intensive task familiar to computer-science students. It is typically tackled with tightly-iterating algorithms in a language close to the hardware, such as C.
It might appear an unrewarding topic for a vector-programming language. We explore it here to study efficient vector solutions in a well-understood problem domain. We are interested not so much in the solutions themselves, but in illustrating the process of developing them. We also see how terse code helps us formulate and evaluate alternative algorithms.
Functional Programming was born from the first Iversonian language, APL, so we begin with purely functional solutions. We examine two closely related questions:
- Is \(x\) prime?
- What are the primes lower than \(N\)?
A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers.2
Ancestor Programming Language
We begin with a teaser.
APL was the first language derived from Iverson Notation, and is an ancestor language of q. Many useful algorithms have short expressions in APL, where they are known as idioms.
An APL3 idiom for finding the primes below 100:
/∘⍳∘⍴⍨2=+⌿0=∘.|⍨⍳100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Those are the risks.
But the APL translates neatly into q, so we’ll look at what it’s doing, then move on with q.
/∘⍳∘⍴⍨ 2= +⌿ 0= ∘.|⍨ ⍳ 100
where 2= sum 0= {x mod\:/:x} til 100
Now we begin to see our way.
The modulo function is |
in APL, mod
in q.
The APL operator ∘.
derives a Cartesian product function from its argument, as do in q the iterators \:/:
.
The APL Commute operator4 ⍨
is a combinator that does the work in q of {x f\:/:x}
where f
is a binary.
q)where 2=sum 0={x mod\:/:x}til 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
{x mod\:/:x}til 100
evaluates modulo for every modulus in the range5 \([0,99]\).
q){x mod\:/:x}til 20
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0
mod[;0]
, the next of mod[;1]
.
q)til[20]mod 0
0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N
q)til[20]mod 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 mod
and 1 mod
.
We don’t need any of that.
q){x mod\:/:x}2_til 20
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1
2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1
2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5
2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1
2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7
2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6
2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5
2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4
2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0
2_til[20]
gradually re-appearing in the lower rows, wherever in x mod y
the right argument exceeds the left.
Clearly, evaluating the lower-left triangle tells us nothing useful.
The zeros indicate exact division; let’s highlight them.
q)-1".0"0={x mod\:/:x}2_til 20;
0.0.0.0.0.0.0.0.0.
.0..0..0..0..0..0.
..0...0...0...0...
...0....0....0....
....0.....0.....0.
.....0......0.....
......0.......0...
.......0........0.
........0.........
.........0........
..........0.......
...........0......
............0.....
.............0....
..............0...
...............0..
................0.
.................0
On closer examination, we see that – ignoring the diagonal in which every number is a divisor of itself – every zero below the first two rows has another zero in the first two rows.
q)-1".0"0={x mod\:/:2 3}2_til 20;
0.0.0.0.0.0.0.0.0.
.0..0..0..0..0..0.
- If \(N\) has divisors one of them must be in the range \([2,\sqrt{N}]\). Since \(\sqrt{20}\) is 4.472136 we need try only
2 3 4
as modulus. - 4 is a multiple of 2, so
4 mod
tells us nothing2 mod
did not.
So we can improve our expression:
q)0={(2_til x)mod\:/:2_til floor sqrt x}100
10101010101010101010101010101010101010101010101010101010101010101010101010101..
01001001001001001001001001001001001001001001001001001001001001001001001001001..
00100010001000100010001000100010001000100010001000100010001000100010001000100..
00010000100001000010000100001000010000100001000010000100001000010000100001000..
00001000001000001000001000001000001000001000001000001000001000001000001000001..
00000100000010000001000000100000010000001000000100000010000001000000100000010..
00000010000000100000001000000010000000100000001000000010000000100000001000000..
00000001000000001000000001000000001000000001000000001000000001000000001000000..
q){./[x;n,'n:til count x;not]}0={(2_til x)mod\:/:2_til floor sqrt x}100
00101010101010101010101010101010101010101010101010101010101010101010101010101..
00001001001001001001001001001001001001001001001001001001001001001001001001001..
00000010001000100010001000100010001000100010001000100010001000100010001000100..
00000000100001000010000100001000010000100001000010000100001000010000100001000..
00000000001000001000001000001000001000001000001000001000001000001000001000001..
00000000000010000001000000100000010000001000000100000010000001000000100000010..
00000000000000100000001000000010000000100000001000000010000000100000001000000..
00000000000000001000000001000000001000000001000000001000000001000000001000000..
q)2+where 0=any{./[x;n,'n:til count x;not]}0={(2_til x)mod\:/:2_til floor sqrt x}100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
We have reduced the computation inherited from the APL idiom, but we can still see plenty of unnecessary evaluations. We are still evaluating the lower-left triangle, though it is no longer half the computation.
The top row shows modulo flagging every even number as composite. That is wasted work: we do not need to test even numbers. Nor multiples of 3 or 5…
Let’s back right off and explore a different approach.
Is X prime?
We have a simple test of primality: does \(x\) have a divisor in the range \([2,\sqrt{x}]\)?
q)ip0:{not 0 in x mod 2_ til floor sqrt x} / is prime?
q)ip0 each 19 20
10b
q)20 mod 2 3 4
0 2 0
q)first .[{(z mod y;y+1;z)}]\1 2 11
1 2 11
1 3 11
2 4 11
3 5 11
1 6 11
5 7 11
4 8 11
3 9 11
2 10 11
1 11 11
0 12 11
q)first .[{(z mod y;y+1;z)}]\1 2 12
1 2 12
0 3 12
q)ip:{.[>]1_first .[{(z mod y;y+1;z)}]/1 2,x}
q)ip each 12 17 19 20
0110b
But not every \(x\) needs testing. We know the primes below 10. Above 10, in base-10 we can eliminate multiples of 2 and 5 on sight: test only numbers ending in 1, 3, 7, or 9.
q){x where in[;1 3 7 9]@[;1]10 vs x}10_til 100
11 13 17 19 21 23 27 29 31 33 37 39 41 43 47 49 51 53 57 59 61 63 67 69 71 73..
q)2 3 5 7,{x where ip each x}{x where in[;1 3 7 9]@[;1]10 vs x}10_til 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
ip
still overcomputes.
For example, it tests 17 against every multiple of 2 below it.
Quite unnecessary: if 2 is not a divisor, nor is any of its multiples.
The same applies to multiples of 3, 5, 7, and so on.
The sieve of Eratosthenes
Which brings us to the Sieve of Eratosthenes – which does no arithmetic at all. As we discover primes, we eliminate their multiples from the candidates.
To find the primes below 100, flag the candidates.
q)show c:01b where 2 98 /candidates: not 0 1
00111111111111111111111111111111111111111111111111111111111111111111111111111..
q)c?1b
2
q)(til 100;){0b,99#10b where(x-1),1}2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ..
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ..
q)c&:{0b,99#10b where(x-1),1}2
q)c
00010101010101010101010101010101010101010101010101010101010101010101010101010..
q)c?1b
3
q)is:(01b where 2 98;0#0) / initial state
00111111111111111111111111111111111111111111111111111111111111111111111111111..
`long$()
q)step:{[(x;y)](x&0b,(count[x]-1)#10b where(n-1),1;y,n:x?1b)}
q)step is
00010101010101010101010101010101010101010101010101010101010101010101010101010..
,2
q)test:{x>2 last/y}[sqrt 100] / fail after √100
q)test step/is
00000000000001000101000100000101000001000101000100000100000101000001000101000..
2 3 5 7 11
q){y,where x}.test step/is
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
pb0:{ /primes below x
is:(01b where 2,x-2;0#0); / initial state
step:{(x&0b,(count[x]-1)#10b where(n-1),1;y,n:x?1b)}.; / step
test:{x>2 last/y}[sqrt x]; / test
test step/is } / final state
pb: .[{y,where x}] pb0 :: /unwind result
AND or Amend At?
The &
in the step
function above is an example of overcomputing.
It is not necessary to AND two long bitmasks together, particularly as 0s in the second grow increasingly sparse.
Instead one could calculate the indices to change and use Amend At to change just those.
For values of \(x\) over a million, this is more efficient.
Below that, it’s quicker to AND the two bitmasks – ‘overcomputing’ wins.
q)pb 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
q)count pb 2000000
148933
q)count last pb0 2000000
224
Summary
The elegant APL idiom we started with revealed under analysis considerable overcomputing and an \(O(n^2)\) complexity that would not scale well to higher values of \(x\).
A cannier strategy eliminated obvious composite numbers and tested only for divisors smaller than the square root.
Both approaches rely on modulo to find dividers; modulo is based on integer division:
q)mod
k){x-y*x div y}
There is clear potential here for caching known primes and using them either to test for primality, for prime factorisation, or as a seed to discover new primes.
This will be addressed in a forthcoming article here.
In all three approaches the use of vector primitves in the REPL assisted visualisation, and to identify opportunities to improve an algorithm.
Contributors
Noah Attrup, Ferenc Bodon, Geo Carncross, Rian Ó’Cuinneagáin, Stephen Taylor
-
An earlier version of this article appeared on KX Community. ↩
-
Wikipedia: Prime number ↩
-
This is Dyalog APL, version 19. APL prompts with six spaces. ↩
-
APL follows Heaviside’s usage of operator to denote a higher-order function. (Cf iterator in q.) Addition, division and so on are simply functions. ↩
-
There appears to be no accepted equivalent of interval notation for integers. In this article interval notation denotes integers. ↩