CoreInfer is an MLP-free adaptive sparse activation inference method based on sentence-level prediction, achieve a 10.33x speedup compared to the Transformers implementation. The overview framework of ...