INDEX
Explanations
phrases related to complaints or objections
references to conspiracy theories or questionable claims about individuals or situations
New Auto-Interp
Negative Logits
nces
-0.86
lasses
-0.76
xual
-0.75
rals
-0.71
Mines
-0.68
itals
-0.67
ney
-0.66
Morgan
-0.65
nings
-0.64
vest
-0.64
POSITIVE LOGITS
lication
0.73
-+-+-+-+
0.73
cloth
0.71
isbury
0.69
outgoing
0.68
beginnings
0.66
entimes
0.66
aho
0.65
osal
0.65
endi
0.64
Activations Density 0.030%