INDEX
Explanations
phrases indicating willingness or readiness to take action
New Auto-Interp
Negative Logits
itler
-0.16
llib
-0.16
luet
-0.15
eday
-0.15
pee
-0.14
\CMS
-0.14
æģ
-0.14
emp
-0.14
pling
-0.14
á»ĩn
-0.14
POSITIVE LOGITS
sacrifice
0.15
sacrifices
0.15
iscard
0.14
">//
0.14
apt
0.14
Ramos
0.14
sacrificing
0.14
unsupported
0.14
slt
0.14
amina
0.14
Activations Density 0.073%