Aren’t octernions non-associative, though? It seems to me that that would make them very clumsy to work with.
Brief introduction to quaternions: In the real numbers, you have one unit, 1. Any real number can be expressed as the product of a real number and 1. In the complex numbers, you have two units, 1 and i. Every complex number can be expressed as the sum of products of real numbers with the units. That is to say, any complex number can be expressed as a1 + bi. In order to add complex numbers, you just add the parts, but to multiply them, you need to know an additional rule for multiplication, that is that ii = -1. Then, you multiply two complex numbers just like you’d multiply any polynomials (using the good old FOIL rule or equivalent), and whenever you have ii in your answer, replace that by -1.
In the quaternions, you have four units (hence the name), 1, i, j, and k. Again, we need some new multiplication rules: ii = jj = kk = -1, and ij = k, jk = i, and ki = j. In order to make quaternions associative (a very desireable propery, meaning that x*(yz) = (xy)z for any x, y, and z), it turns out that you need to also set ji = -k, kj = -i, and ik = -j. So quaternions are not commutative: It’s not always true that xy = yx. Commutivity is a useful property, but it’s not actually as important as associativity, and hey, you can’t have everything.
Any system of objects which follows those multiplication rules and vector addition rules are quaternions. It’s not necessary to have any particular representation for them; you can just call them 1, i, j, and k if you like. It happens, though, that one can choose a set of matrices which behave in exactly this manner, and it’s often convenient to do so, since there’s a lot of work already done for working with matrices. The matrices themselves are sometimes called the quaternions, but that’s not strictly accurate: They’re just one of many ways of representing quaternions.