INDEX
Negative Logits
ing
-0.23
our
-0.17
ợi
-0.15
able
-0.15
if
-0.15
ery
-0.14
erg
-0.14
O
-0.14
ded
-0.14
conflict
-0.14
POSITIVE LOGITS
orida
0.18
dden
0.17
rowsable
0.17
rega
0.15
ellaneous
0.15
#__
0.15
tdown
0.15
antro
0.15
klä
0.15
elage
0.14
Activations Density 0.026%