INDEX
Negative Logits
we
-1.70
We
-1.03
We
-1.02
我們
-1.00
我们
-0.92
мы
-0.92
we
-0.89
我们就
-0.76
нами
-0.75
kita
-0.75
POSITIVE LOGITS
are
0.98
have
0.83
believe
0.74
seek
0.73
operate
0.72
rely
0.71
intend
0.71
advertise
0.71
compensate
0.70
strive
0.69
Activations Density 0.088%