INDEX
Negative Logits
juice
-0.07
railway
-0.07
êtes
-0.07
ülük
-0.07
RIGHTS
-0.06
Were
-0.06
Dialogue
-0.06
Directive
-0.06
Policy
-0.06
orris
-0.06
POSITIVE LOGITS
sign
0.07
.col
0.07
Bian
0.07
significa
0.07
_BASE
0.06
leased
0.06
unveiling
0.06
.entry
0.06
comments
0.06
милли
0.06
Activations Density 0.006%