INDEX
Negative Logits
either
-0.98
Either
-0.84
appartement
-0.82
ပ
-0.80
gagné
-0.79
contribué
-0.78
あっという
-0.77
tatuaje
-0.77
Lü
-0.77
высокая
-0.76
POSITIVE LOGITS
as
1.00
below
0.98
nedan
0.85
dibawah
0.84
HERE
0.83
;
0.83
اصله
0.81
how
0.81
Lotus
0.80
etter
0.79
Activations Density 0.010%