INDEX
Explanations
expressions of disbelief or astonishment
New Auto-Interp
Negative Logits
uada
-0.16
aux
-0.15
048
-0.15
βε
-0.14
_FF
-0.14
è¾¼
-0.14
lamaz
-0.14
ä»Ĭ
-0.14
partial
-0.14
εί
-0.14
POSITIVE LOGITS
heits
0.16
tie
0.15
ulse
0.15
uhl
0.15
Marshal
0.15
691
0.14
Foster
0.14
anyone
0.14
Tie
0.14
waters
0.14
Activations Density 0.098%