INDEX
Explanations
phrases related to opinions or positions on certain issues
expressions of opinions or positions on various subjects
New Auto-Interp
Negative Logits
NetMessage
-0.85
duct
-0.69
cing
-0.68
Tycoon
-0.68
STON
-0.67
athan
-0.65
esters
-0.64
rafted
-0.63
amn
-0.63
IVERS
-0.63
POSITIVE LOGITS
stance
1.30
stances
1.16
toward
0.92
posture
0.86
towards
0.86
positions
0.85
position
0.85
reversal
0.78
views
0.75
pledge
0.74
Activations Density 0.024%