INDEX
Explanations
initiator or stabilizer roles
New Auto-Interp
Negative Logits
Annex
0.42
原文
0.42
后缀
0.41
Versch
0.40
法令
0.40
мель
0.40
Estat
0.39
Derived
0.39
Trapez
0.39
couper
0.38
POSITIVE LOGITS
initro
0.44
induction
0.42
vegetables
0.42
ıyı
0.41
ity
0.40
kin
0.39
nil
0.38
induction
0.38
종류
0.38
song
0.38
Activations Density 0.001%