INDEX
Explanations
words related to positions or viewpoints on various issues or topics
references to positions or opinions on various issues
New Auto-Interp
Negative Logits
Interstitial
-0.80
NetMessage
-0.75
cing
-0.73
ined
-0.72
esters
-0.69
usted
-0.68
sample
-0.67
Sparkle
-0.65
nova
-0.63
ining
-0.62
POSITIVE LOGITS
stance
1.51
stances
1.30
reversal
0.88
posture
0.85
yip
0.84
position
0.82
plank
0.82
positions
0.78
towards
0.77
olicy
0.76
Activations Density 0.010%