INDEX
Explanations
adjectives and descriptions related to people and their behaviors
sociopolitical labels and identities
New Auto-Interp
Negative Logits
tions
-0.68
partName
-0.62
GOODMAN
-0.61
tails
-0.58
iland
-0.57
ancies
-0.55
mentioned
-0.54
uyomi
-0.52
mares
-0.52
waiver
-0.52
POSITIVE LOGITS
inferior
0.70
discipl
0.66
underdog
0.65
unworthy
0.65
undermin
0.62
achievable
0.61
martyr
0.60
deserving
0.60
savior
0.59
superior
0.58
Activations Density 0.656%