INDEX
Negative Logits
mâ
-0.25
None
-0.25
çIJĨæĥ³çļĦ
-0.25
Subject
-0.24
Reich
-0.24
*dt
-0.24
çĪ±ä½ł
-0.24
sse
-0.23
arget
-0.23
eron
-0.23
POSITIVE LOGITS
vern
0.28
validators
0.27
ounter
0.26
æijĩ
0.26
nesia
0.26
å¿ĥ
0.25
蹦
0.25
sher
0.25
inds
0.25
æIJĸ
0.25
Activations Density 0.002%