INDEX
Explanations
proper nouns, specifically names of individuals
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.04
3:0.05
4:0.04
5:0.04
6:0.37
7:0.12
8:0.05
9:0.07
10:0.05
11:0.04
Negative Logits
entially
-1.31
uously
-1.19
AST
-1.17
Fas
-1.16
KT
-1.13
depend
-1.13
FU
-1.12
cific
-1.11
Healer
-1.10
ruary
-1.10
POSITIVE LOGITS
roth
1.69
zan
1.47
zon
1.45
acher
1.43
backer
1.36
adiq
1.24
cler
1.23
termination
1.23
itans
1.23
sche
1.23
Activations Density 0.007%