INDEX
Explanations
phrases indicating preferences, desires, or intentions
New Auto-Interp
Negative Logits
ione
-0.16
inos
-0.14
form
-0.14
iday
-0.14
íıŃ
-0.14
cid
-0.13
ÑĢава
-0.13
imson
-0.13
831
-0.13
Visibility
-0.13
POSITIVE LOGITS
peÄį
0.17
oir
0.16
лиÑĤ
0.15
uge
0.15
reau
0.14
íĥĢ
0.14
Labour
0.14
orr
0.14
Saunders
0.14
_CHILD
0.14
Activations Density 0.363%