INDEX
Explanations
phrases related to societal issues and protests
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.07
3:0.04
4:0.13
5:0.02
6:0.04
7:0.39
8:0.02
9:0.03
10:0.10
11:0.07
Negative Logits
arnaev
-1.70
Trop
-1.47
illac
-1.47
ategory
-1.47
utical
-1.45
afort
-1.43
perture
-1.43
foundation
-1.41
Panama
-1.41
Americas
-1.40
POSITIVE LOGITS
passers
2.21
loudly
1.91
incess
1.79
aloud
1.75
loud
1.73
noises
1.66
laughter
1.63
praises
1.60
louder
1.58
neigh
1.56
Activations Density 0.110%