INDEX
Explanations
the pronoun "who" in various contexts, often relating to identification or description of individuals
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.19
3:0.09
4:0.06
5:0.04
6:0.03
7:0.03
8:0.07
9:0.04
10:0.12
11:0.25
Negative Logits
soc
-1.72
arts
-1.71
ographies
-1.61
lists
-1.53
hetics
-1.51
adium
-1.50
eals
-1.50
raft
-1.50
Soc
-1.49
erning
-1.48
POSITIVE LOGITS
projectile
1.62
flanked
1.62
angered
1.56
scapego
1.53
unemploy
1.50
besie
1.49
provoked
1.46
forcefully
1.45
eatures
1.45
annexed
1.45
Activations Density 0.030%