INDEX
Negative Logits
Darkness
-0.76
lihood
-0.74
FORE
-0.69
donor
-0.66
··
-0.66
Dangerous
-0.65
livest
-0.65
IST
-0.65
Farn
-0.64
âĸ¬
-0.63
POSITIVE LOGITS
oise
1.52
urous
1.50
uring
1.15
ured
1.11
illas
1.09
urers
1.08
imer
1.08
uous
1.07
eur
1.03
ures
1.03
Activations Density 0.003%