INDEX
Explanations
phrases that indicate deception or manipulation regarding historical narratives or beliefs
Being tricked, fooled, or deceived
falling for deception
New Auto-Interp
Negative Logits
rrggbb
-0.46
usercontent
-0.43
httphttps
-0.41
ียญ
-0.38
MockMvc
-0.37
belangrij
-0.37
UAGES
-0.36
ніципа
-0.36
displeasure
-0.36
twimg
-0.36
POSITIVE LOGITS
naive
1.23
gul
1.17
fooled
1.16
naïve
1.11
naï
1.05
unsuspecting
1.01
foolish
0.99
fool
0.93
deceived
0.91
naive
0.90
Activations Density 0.396%