INDEX
Explanations
references to older individuals and their interactions within various contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.08
3:0.04
4:0.05
5:0.04
6:0.38
7:0.05
8:0.05
9:0.03
10:0.09
11:0.07
Negative Logits
ongyang
-1.47
ccoli
-1.38
ixt
-1.35
uese
-1.30
ospons
-1.30
ocument
-1.25
ILA
-1.25
bably
-1.23
itionally
-1.19
ット
-1.18
POSITIVE LOGITS
than
1.72
wiser
1.33
ner
1.31
types
1.24
rant
1.21
deb
1.20
Baptist
1.17
rants
1.17
cases
1.16
necks
1.14
Activations Density 0.014%