INDEX
Negative Logits
pton
-0.59
Canad
-0.54
Cameron
-0.54
plex
-0.51
atown
-0.49
disenfranch
-0.49
vier
-0.49
oleon
-0.48
Horowitz
-0.48
abwe
-0.48
POSITIVE LOGITS
ŀ
0.70
ness
0.63
ú
0.62
ener
0.61
ë
0.60
itled
0.59
emption
0.57
ure
0.57
û
0.57
raction
0.56
Activations Density 0.493%