INDEX
Explanations
phrases that include casual expressions or informal language, particularly those introducing parenthetical information
New Auto-Interp
Negative Logits
uala
-0.16
616
-0.14
Hurt
-0.13
td
-0.13
ouz
-0.13
igit
-0.13
Unsure
-0.13
wp
-0.13
row
-0.13
313
-0.13
POSITIVE LOGITS
krom
0.16
OLON
0.16
à¸IJ
0.15
azen
0.15
áli
0.15
abra
0.14
olon
0.14
SID
0.14
heimer
0.14
encount
0.14
Activations Density 0.069%