INDEX
Negative Logits
be
-0.08
are
-0.08
were
-0.07
attract
-0.07
貼
-0.07
alo
-0.06
chin
-0.06
V
-0.06
VI
-0.06
Navigation
-0.06
POSITIVE LOGITS
Does
0.14
Does
0.13
does
0.12
does
0.11
DOES
0.10
.Does
0.09
Doe
0.09
血
0.07
르게
0.07
helps
0.07
Activations Density 0.029%