INDEX
Negative Logits
balancing
-0.08
letics
-0.07
sustaining
-0.07
Sør
-0.07
itionally
-0.07
-bal
-0.07
format
-0.07
-0.07
onent
-0.07
passen
-0.07
POSITIVE LOGITS
ロン
0.09
strict
0.09
.strict
0.09
엄
0.09
stric
0.08
stringent
0.08
kraju
0.08
sexo
0.08
STRICT
0.08
prohibition
0.08
Activations Density 0.001%