This won't compile, as it's missing the xAVX functions. I have a sort-of implementation of them, but it's brutal. 5400 lines total. For now, they can be substituted with the x functions (remove AVX) to compile the kernel.