INDEX
Explanations
references to the 9/11 attacks and related conspiracy theories
New Auto-Interp
Negative Logits
itably
-0.75
iating
-0.71
iated
-0.71
holders
-0.70
ivari
-0.70
iator
-0.67
itable
-0.67
enture
-0.66
pmwiki
-0.66
iations
-0.65
POSITIVE LOGITS
9999
1.36
999
1.27
06
1.22
090
1.20
07
1.14
08
1.12
03
1.10
04
1.08
02
1.04
09
1.02
Activations Density 0.065%