INDEX
Explanations
phrases related to political rhetoric and misinformation
statements about misconceptions and myths regarding bisexuality
New Auto-Interp
Negative Logits
hess
-0.69
Grateful
-0.66
ktop
-0.62
atra
-0.62
contrace
-0.62
atri
-0.61
foreseen
-0.61
sung
-0.60
Rhythm
-0.60
syn
-0.60
POSITIVE LOGITS
falsely
0.90
excuses
0.89
nonsense
0.87
justification
0.85
ignorance
0.85
ignor
0.85
scapego
0.84
delusions
0.83
baseless
0.83
slander
0.81
Activations Density 0.691%