INDEX
Explanations
expressions of admiration or astonishment
New Auto-Interp
Negative Logits
sillon
-0.77
lüğ
-0.73
emplois
-0.73
Carole
-0.70
yyah
-0.70
Carole
-0.70
cientos
-0.70
nostru
-0.67
uyla
-0.65
Barrow
-0.65
POSITIVE LOGITS
DMG
0.85
Amaz
0.83
Ains
0.83
Vain
0.81
Erm
0.81
jectures
0.79
Flames
0.78
AMAZING
0.78
Lain
0.77
Chains
0.77
Activations Density 0.090%