INDEX
Explanations
words related to negative attributes or actions
terms associated with negative behaviors or conditions
New Auto-Interp
Negative Logits
GOODMAN
-0.85
quart
-0.73
Hemp
-0.72
avez
-0.71
anchester
-0.70
ĸļ
-0.70
arate
-0.69
Occupations
-0.68
bleacher
-0.67
ĺħ
-0.65
POSITIVE LOGITS
cffff
0.93
ness
0.87
nesses
0.85
foolish
0.82
ly
0.80
é¾įå
0.72
itous
0.71
cia
0.70
arrogance
0.70
modesty
0.69
Activations Density 0.014%