INDEX
Explanations
intensifiers and strong expressions of frustration or emphasis
New Auto-Interp
Negative Logits
殿
-0.16
ä¿
-0.15
áÅĻ
-0.15
elib
-0.15
ird
-0.15
olland
-0.14
ult
-0.14
ilig
-0.14
similarly
-0.13
equally
-0.13
POSITIVE LOGITS
auer
0.17
STE
0.14
@{↵0.14
mal
0.14
reation
0.14
éo
0.13
mal
0.13
Mal
0.13
\e
0.13
ØŃÙħ
0.13
Activations Density 0.024%