INDEX
Explanations
negations or expressions of unwillingness or incapacity
New Auto-Interp
Negative Logits
çĶ
-0.15
iaux
-0.14
_asm
-0.14
νοÏĤ
-0.14
sta
-0.14
dete
-0.14
cht
-0.14
ÌĨ
-0.14
лами
-0.14
halt
-0.13
POSITIVE LOGITS
want
0.25
wants
0.21
confidence
0.20
wanted
0.19
feel
0.19
muá»ijn
0.19
mood
0.18
trust
0.18
Mood
0.17
Want
0.17
Activations Density 0.086%