INDEX
Explanations
pronouns and specific references to individuals
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.03
3:0.05
4:0.04
5:0.04
6:0.47
7:0.05
8:0.04
9:0.06
10:0.06
11:0.08
Negative Logits
hower
-1.34
shire
-1.33
pulp
-1.29
ATTLE
-1.15
advertisement
-1.13
MJ
-1.12
Violet
-1.11
reviewed
-1.10
gates
-1.06
GREEN
-1.05
POSITIVE LOGITS
EStream
1.49
agog
1.47
opian
1.42
ensitive
1.37
ukong
1.36
idi
1.36
xual
1.36
amia
1.35
uddin
1.34
ovi
1.30
Activations Density 0.001%