INDEX
Explanations
phrases related to misinformation and manipulation for political or personal gain
instances of deception or misleading information
New Auto-Interp
Negative Logits
atonin
-0.81
ipeg
-0.80
ridor
-0.79
ixel
-0.78
cade
-0.78
ftime
-0.76
utra
-0.76
pring
-0.75
Aires
-0.75
enhagen
-0.73
POSITIVE LOGITS
misplaced
1.09
incompetent
1.08
deceit
1.07
misunderstood
1.07
inadequ
1.05
misrepresent
1.03
inept
0.99
illeg
0.98
unworthy
0.98
misleading
0.98
Activations Density 0.355%