INDEX
Explanations
rhetorical questions and strong emotional expressions
New Auto-Interp
Negative Logits
obil
-0.17
oy
-0.15
ESSAGES
-0.15
eway
-0.15
oran
-0.14
ahead
-0.14
_Checked
-0.14
rippling
-0.14
htar
-0.14
ambio
-0.13
POSITIVE LOGITS
Serv
0.15
andra
0.14
nict
0.14
bras
0.14
sec
0.14
inplace
0.14
nih
0.13
ILT
0.13
eden
0.13
lô
0.13
Activations Density 0.020%