INDEX
Explanations
opinions expressed about political views and statements
New Auto-Interp
Negative Logits
Faster
-0.67
delay
-0.63
aunts
-0.63
trap
-0.61
Furious
-0.59
asks
-0.59
gins
-0.58
rolled
-0.58
brut
-0.58
gi
-0.57
POSITIVE LOGITS
editorial
0.94
opinions
0.86
ãĥĦ
0.74
affiliate
0.72
endorsement
0.72
Editorial
0.71
satire
0.69
opinion
0.68
EngineDebug
0.68
subjective
0.67
Activations Density 0.256%