UT5: Pretraining Non autoregressive T5 with unrolled denoising Paper • 2311.08552 • Published Nov 14, 2023 • 7
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers Paper • 2311.10642 • Published Nov 17, 2023 • 23