Artax-ttx3-mega-multi-v4 Today

Early benchmarks (leaked? maybe) show it beating GPT-4o on MATH-500 by ~4% and GPQA by ~7%, while using 2.3x less active FLOPs per token than standard MOE.

Would love to hear if anyone has run it on long-form multi-step reasoning tasks (legal docs, code agents, scientific literature review). Artax-ttx3-mega-multi-v4

Enter .

We’ve seen a quiet but massive shift in how LLMs are being stitched together under the hood. Not MOE in the traditional sparse sense – but something closer to multi-opinion consensus routing . Early benchmarks (leaked