INDEX
Explanations
mentions of political controversies or allegations
New Auto-Interp
Negative Logits
UTO
-0.17
UNION
-0.15
UDA
-0.15
ffen
-0.15
anye
-0.14
arket
-0.14
uto
-0.13
SWITCH
-0.13
mailto
-0.13
utf
-0.13
POSITIVE LOGITS
intelligence
0.40
Intelligence
0.35
intelligence
0.32
CIA
0.32
Pentagon
0.30
State
0.29
Defense
0.28
intel
0.27
administration
0.26
White
0.26
Activations Density 0.183%