INDEX
Explanations
references to opinions or positions held on various issues
New Auto-Interp
Negative Logits
NetMessage
-1.11
STON
-0.75
batch
-0.74
duct
-0.70
Sabha
-0.67
Towers
-0.67
issance
-0.65
ãĥ¼ãĥĨãĤ£
-0.65
Osc
-0.63
Sample
-0.63
POSITIVE LOGITS
stance
1.29
stances
1.24
toward
1.03
towards
1.00
regarding
0.95
positions
0.89
views
0.88
against
0.86
favoring
0.83
viewpoint
0.82
Activations Density 0.018%