INDEX
Explanations
statements that assert or emphasize a viewpoint or opinion
New Auto-Interp
Negative Logits
ãi
-0.15
ucwords
-0.15
ÑĥÑĢа
-0.14
hi
-0.13
TION
-0.13
acman
-0.13
kü
-0.13
UNDLE
-0.13
uche
-0.13
-:
-0.13
POSITIVE LOGITS
namely
0.32
nam
0.22
Nam
0.21
Instead
0.21
There
0.21
Each
0.20
It
0.19
They
0.19
Either
0.19
viz
0.18
Activations Density 0.083%