INDEX
Negative Logits
kd
-0.06
[j
-0.06
drift
-0.06
.sd
-0.06
Couldn
-0.06
-road
-0.06
Beng
-0.06
Oh
-0.06
d
-0.06
Kn
-0.06
POSITIVE LOGITS
ure
0.12
URE
0.11
are
0.11
atore
0.11
ARE
0.11
ore
0.11
ire
0.11
urre
0.10
re
0.10
Gore
0.09
Activations Density 0.194%