INDEX
Negative Logits
unlawful
-0.10
Ernst
-0.10
inati
-0.10
Pra
-0.10
ReadOnly
-0.09
ilio
-0.09
mani
-0.09
ussy
-0.09
igon
-0.09
wh
-0.08
POSITIVE LOGITS
allowed
0.19
allowed
0.17
Allowed
0.14
permitted
0.14
Allowed
0.13
before
0.13
maximum
0.12
tolerated
0.11
Maximum
0.11
åħģ
0.11
Activations Density 0.112%