INDEX
Explanations
phrases indicating negative judgments or situations
mentions of the word "poor."
New Auto-Interp
Negative Logits
ategory
-0.99
thus
-0.71
CU
-0.70
SPONSORED
-0.68
theless
-0.67
BuyableInstoreAndOnline
-0.67
Pi
-0.67
auer
-0.66
Laughs
-0.66
natureconservancy
-0.65
POSITIVE LOGITS
die
0.92
dies
0.89
quality
0.88
souls
0.88
quality
0.84
luck
0.84
sap
0.81
grades
0.81
performers
0.79
imitation
0.78
Activations Density 0.036%