INDEX
Negative Logits
deny
-1.02
denial
-0.96
Admit
-0.90
denying
-0.89
denies
-0.87
Denial
-0.86
denial
-0.85
]--;
-0.84
ніципа
-0.84
Deny
-0.84
POSITIVE LOGITS
fe
0.38
HexString
0.38
per
0.35
sted
0.34
protested
0.34
कारी
0.34
kró
0.32
TagMode
0.32
tác
0.32
bé
0.31
Activations Density 0.002%