INDEX
Explanations
words related to negative opinions or criticism
negative descriptors, particularly related to the term "horrible."
New Auto-Interp
Negative Logits
Southwest
-0.78
RIC
-0.67
camp
-0.66
Qian
-0.65
Northwest
-0.64
ista
-0.64
pillow
-0.62
scholarship
-0.62
north
-0.60
¶
-0.60
POSITIVE LOGITS
kefeller
0.87
terday
0.81
rible
0.78
theless
0.77
edom
0.75
xon
0.75
--+
0.73
arten
0.72
bley
0.72
Gaal
0.70
Activations Density 0.034%