Incidentally, here is why a prime is a sum of two squares just in case it is either 2 or 1 mod 4:
First, among the complex numbers, let’s say those of the form a + b * i for ordinary integers a and b are “Gaussian integers”. Also, as usual, we’ll say the size |a + b * i| of a complex number a + b * i is sqrt(a^2 + b^2) [equivalently, the squared size |a + b * i|^2 of a complex number a + b * i is obtained by multiplying it by its conjugate a - b * i]. So the question is, which primes are squared sizes of Gaussian integers?; i.e., which primes are products of two conjugate Gaussian integers?
The benefit of looking at things this way is that the Gaussian integers have a structure very similar to the ordinary integers. In particular, they come with a notion of prime factorization very similar to that for the ordinary integers.
To see this, let’s review how prime factorization works:
Let’s call two integers “similar” if each is a multiple of the other (so, e.g., x and -x are similar; note that similar values have the same size). From now on, except where otherwise specified, I will not distinguish between similar values; they’re as good as equal to me.
We say a nonzero integer p is “prime” if any way of expressing it as a product of integers involves a factor of p. (Note that 1 (or any value similar to 1; aka, any “unit”) is not considered prime because of the empty product.)
By repeatedly breaking any particular nonzero integer into a product of smaller integers, if possible, and stopping when reaching primes, we find that every nonzero integer has some representation as a product of primes. (Note that this factorization process cannot go on forever, by consideration of how sizes get smaller at each stage)
Not only that, but this representation is unique. (This follows from the fact that if a prime divides a product, it divides one of the product’s factors. [This in turn can be seen as so: if prime p divides ab but does not divide a, we have that p is a common divisor of ab and pb, so p must divide gcd(ab, pb) = gcd(a, p) * b = 1 * b = b. (This reasoning in turn uses basic facts about gcd [specifically, that any common divisor divides the gcd, and that gcd scales up as its inputs scale up] which can be seen from, for example, the Euclidean algorithm to compute gcds, which makes use of a “division and remainder (smaller than the divisor)” operator, whose existence just depends on the fact that any ratio of integers has distance less than 1 from some integer. Again, this algorithm cannot go on forever by consideration of how sizes get smaller at each stage.)])
All of the above was phrased in terms of integers, but, in fact, it works just as well replacing “integer” by “Gaussian integer” throughout! So we have that each nonzero Gaussian integer uniquely factorizes into a product of Gaussian primes.
In particular, each ordinary integer also has a factorization as a product of Gaussian primes, which we can obtain by first factoring it into ordinary primes, and then further factoring those ordinary primes into Gaussian primes. So let’s try to understand how ordinary primes factor into Gaussian primes.
Let p be an ordinary prime and let q be a Gaussian prime factor of p. By symmetry, q’s conjugate must also be a prime factor of p. It may be that q is similar to its conjugate or not.
If q and its conjugate aren’t similar, then their product is a non-unit integer factor of p, and thus must be p itself. In this case, let’s say p is a “bifurcating” prime; it’s prime within the integers, but splits into two dissimilar conjugate primes within the Gaussian integers.
On the other hand, if q and its conjugate are similar, then q is either similar to an integer or similar to an integer multiple of 1 + i. In the former case, as p is prime within the integers, we must have that q is similar to p itself, and thus p is already a Gaussian prime (let’s call such p “atomic”). In the latter case (with q similar to a multiple of 1 + i), we have that p is divisible by 1 + i, and thus p^2 = |p|^2 is divisible by |1 + i|^2 = 2; this can only occur when p itself is even, which is to say, when p is 2. And, indeed, 2 has the Gaussian prime factorization (1 + i) * (1 - i), a product of two similar conjugates.
In summary, every odd ordinary prime is either itself a Gaussian prime (in which case we’ll call it “atomic”), or the product of two dissimilar conjugate Gaussian primes (in which case we’ll call it “bifurcating”). The one other ordinary prime is 2, which has the Gaussian prime factorization (1 + i) * (1 - i), a product of two similar conjugates.
Phew! That’s quite a bit already, but we’re almost done. Now we just need to figure out which odd primes are atomic/bifurcating. The bifurcating ones will be products of conjugates (i.e., sums of two squares) and the atomic ones will not be.
If ordinary prime p equals (A + iB) * (A - iB) = A^2 + B^2, then B won’t be divisible by p, and so A/B will be a well-defined square root of -1 in the field of integers mod p. Conversely, if there is a square root r of -1 in the field of integers mod p, then we have that 1 + r^2 = (1 + ir) * (1 - ir) is a multiple of p, even though p divides neither factor, and thus, p cannot be a Gaussian prime itself. Thus, p is atomic just in case there is no square root of -1 modulo p.
How do we tell if -1 has a square root modulo p? More generally, how do we tell if a value has a square root modulo p? Well, note that two values have the same square just in case they are each other’s negations; restricting attention to the case where p is odd, no nonzero value is its own negation, so the p - 1 nonzero values square to precisely (p - 1)/2 distinct nonzero values. Furthermore, Fermat’s Little Theorem tells us r^(p - 1) = (r^2)^((p - 1)/2) is 1 modulo p for any nonzero r; accordingly, the nonzero values with square roots modulo p are precisely the (p - 1)/2 many solutions to x^((p - 1)/2) = 1.
Does (-1)^((p - 1)/2) = 1? Well, by the nature of -1, this is true just in case (p - 1)/2 is even, which is to say, just in case p is 1 mod 4. Thus, we see that the bifurcating odd primes are precisely those which are 1 mod 4 (with the primes which are 3 mod 4 being atomic, and the even prime 2 being its own odd man out), completing our investigation.