INDEX
Explanations
conditional phrases and questions related to decisions and actions
New Auto-Interp
Negative Logits
itou
-0.15
contres
-0.14
andr
-0.14
----------------------------------------------------------------------------
-0.14
sofar
-0.13
cl
-0.13
èĥ¡
-0.13
tein
-0.13
eland
-0.13
omik
-0.13
POSITIVE LOGITS
something
0.23
things
0.22
something
0.22
someone
0.21
somebody
0.20
Something
0.20
omething
0.19
someone
0.19
Something
0.19
æľī人
0.18
Activations Density 0.220%