INDEX
Explanations
words related to the characteristics and behaviors of people
statements about human traits or behaviors and their tendencies
New Auto-Interp
Negative Logits
osate
-0.81
ospons
-0.80
bernatorial
-0.75
Anniversary
-0.75
andise
-0.73
ornia
-0.71
legality
-0.70
inia
-0.69
reau
-0.67
tains
-0.66
POSITIVE LOGITS
aware
1.19
incapable
1.18
happiest
1.18
smarter
1.17
unaware
1.15
accustomed
1.13
afraid
1.11
happier
1.11
obsessed
1.10
willing
1.09
Activations Density 0.256%