Cartinoe5930's picture
Update README.md
ca248af verified
|
raw
history blame
1.47 kB
metadata
license: apache-2.0

β›” This model is not iDUS model. This model is variant of them to test the effectiveness of iDUS.

interlocked-DUS(iDUS)

We attempted to improve the performance of the model by further minimizing the layer distance without significantly departing from the framework of DUS.

Architectural Details

We propose interlocked-DUS(iDUS) the variant of DUS! As you can see from the name, it does not connect the layers as a whole like DUS but divides into groups and merges them so that they interlock with each other. With this mechanism, iDUS more effectively reduces the layer distance that was important in DUS and has greater strength in processing. The figure below illustrates the overall framework of iDUS.

This model attempted to interlock using one layer as a standard to test the effectiveness of iDUS.

πŸ† HuggingFace Open LLM Leaderboard

Model ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K Average
Llama2_init_Mistral 60.07 83.3 64.09 42.15 78.37 37.91 60.98
SOLAR-10.7B-DUS-Implementation 59.56 81.18 63.68 40.72 76.48 26.99 58.1
iDUS-1layer 27.73 26.65 24.91 48.58 49.17 0 29.51
iDUS(iDUS-8layer) 59.3 81.34 63.22 40.62 76.24 29.57 58.38