Optimize a lattice-based signature algorithm: ml-dsa

We have optimized a lattice-based signature algorithm: ml-dsa

Changes:

The changes are small and easy to review. Benchmarks show approximately 60โ€“70% of the original cycle count.

We have also provided CKB-VM optimized implementations of SHAKE128, SHAKE256, SHA3-256, and SHA3-512:

8 Likes