Discussion about this post

User's avatar
Neural Foundry's avatar

VeRA is pretty clever with its shared random matrices across layers. The memory savings from only training those tiny scaling vectors adds up fast on bigger models. Have you seen any benchmarks comparing VeRA to standard LoRA on instrcution tuning tasks?

Expand full comment

No posts