INDEX
Explanations
phrases related to surveillance and privacy concerns
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.20
3:0.13
4:0.06
5:0.05
6:0.02
7:0.06
8:0.07
9:0.09
10:0.11
11:0.11
Negative Logits
issance
-1.42
�
-1.38
ּ
-1.34
Lifetime
-1.33
etric
-1.30
soever
-1.23
Badge
-1.22
ishable
-1.21
Mesh
-1.18
Exhibition
-1.15
POSITIVE LOGITS
complain
1.55
blame
1.45
untled
1.43
livious
1.40
boycot
1.35
themselves
1.34
disagree
1.34
boycott
1.32
their
1.31
misled
1.30
Activations Density 0.352%