One thing that comes up in the sort of code ML I like to write is a careful attention to memory layout. Cracovians, defined according to some sibling comment as (B^T)A, make that a little more natural, since B and A can now have the same layout. I haven't used them though, so I don't have a good sense of whether that's more or less painful than other approaches.