INDEX
Explanations
mentions of political figures and government actions, particularly related to investigations or requests
names of political figures and references to political events or entities
New Auto-Interp
Negative Logits
Collective
-0.77
unity
-0.74
edit
-0.68
UE
-0.68
ept
-0.67
Guides
-0.67
hindsight
-0.67
vered
-0.66
PK
-0.65
itual
-0.64
POSITIVE LOGITS
alde
0.79
icut
0.69
deserts
0.68
Jr
0.63
nen
0.63
ASHINGTON
0.61
ORGE
0.60
outl
0.60
ibia
0.60
taboola
0.60
Activations Density 0.264%