INDEX
Negative Logits
undry
-0.19
anson
-0.17
alon
-0.15
alian
-0.15
amax
-0.15
alam
-0.15
urent
-0.14
TAG
-0.14
rope
-0.14
ipv
-0.14
POSITIVE LOGITS
ess
0.31
esses
0.31
cub
0.26
lion
0.24
ardo
0.23
Lion
0.22
mane
0.22
Cub
0.20
lions
0.20
ESS
0.19
Activations Density 0.007%