INDEX
Explanations
expressions of surprise or amazement
New Auto-Interp
Negative Logits
loor
-0.18
atör
-0.16
егоÑĢ
-0.15
istrov
-0.15
enschaft
-0.15
令
-0.15
avou
-0.15
blr
-0.14
ergarten
-0.14
inel
-0.14
POSITIVE LOGITS
zers
0.27
zer
0.24
za
0.22
zas
0.18
talk
0.18
-factor
0.17
Factor
0.17
indr
0.17
/flutter
0.16
factor
0.16
Activations Density 0.016%