INDEX
Explanations
terms related to the definition and discussion of "fake news."
New Auto-Interp
Head Attr Weights
0:0.18
1:0.02
2:0.22
3:0.08
4:0.04
5:0.10
6:0.04
7:0.07
8:0.06
9:0.05
10:0.04
11:0.03
Negative Logits
Ach
-2.80
Quart
-2.68
Hep
-2.47
dancer
-2.42
サーティワン
-2.40
VALUE
-2.31
Hes
-2.31
Bagg
-2.30
Henri
-2.29
Sacrifice
-2.29
POSITIVE LOGITS
Fake
3.96
disinformation
3.88
bots
3.82
spam
3.82
retweet
3.74
fake
3.66
hoax
3.49
debunk
3.37
Fake
3.37
CNN
3.31
Activations Density 0.019%