Mixture-of-Depths: Dynamic Token Skip Cuts 40% FLOPs
MoD routing lets tokens skip layers dynamically, cutting 40% FLOPs in GPT-scale models without quality loss. Here's how it works and when it breaks.
Read the full article: Mixture-of-Depths: Dynamic Token Skip Cuts 40% FLOPs
You're receiving this because you subscribed to TildAlice newsletter. | #Mixture-of-Depths, #Transformer Optimization, #Efficient LLMs, #Dynamic Routing, #Inference Speedup
Don't miss what's next. Subscribe to TildAlice Dev Weekly: