INDEX
Explanations
text related to conspiracy theories
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.09
3:0.05
4:0.05
5:0.04
6:0.14
7:0.04
8:0.04
9:0.04
10:0.22
11:0.19
Negative Logits
irlf
-1.37
<[
-1.24
OH
-1.23
tch
-1.20
SAY
-1.20
alsh
-1.18
orgetown
-1.18
rocal
-1.18
nsic
-1.16
othy
-1.16
POSITIVE LOGITS
/)
1.23
/"
1.23
Chaser
1.20
fing
1.18
Jaguar
1.13
modification
1.13
leagues
1.08
starter
1.07
cubes
1.07
indust
1.07
Activations Density 0.005%