INDEX
Explanations
words related to negative emotions or qualities such as "awful," "horrible," and "terrible."
terms expressing negative evaluations or descriptions of experiences, conditions, or entities
New Auto-Interp
Negative Logits
ership
-0.87
pai
-0.86
irs
-0.82
ilus
-0.77
idable
-0.76
essor
-0.74
eters
-0.72
andr
-0.71
restricted
-0.71
olate
-0.70
POSITIVE LOGITS
smelling
0.85
sounding
0.83
nightmares
0.82
nightmare
0.80
tasting
0.78
horrible
0.76
manners
0.75
headache
0.74
ordeal
0.74
blow
0.73
Activations Density 0.035%