INDEX
Explanations
proper nouns, specifically names of individuals
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.05
3:0.05
4:0.05
5:0.04
6:0.36
7:0.15
8:0.05
9:0.06
10:0.06
11:0.04
Negative Logits
ruary
-1.37
unreliable
-1.36
pection
-1.35
specializing
-1.34
steroid
-1.33
geries
-1.33
reliable
-1.32
fart
-1.29
gimm
-1.26
lackluster
-1.24
POSITIVE LOGITS
isen
1.68
�
1.56
Guard
1.43
icket
1.43
Jr
1.38
coins
1.32
ickets
1.30
AX
1.30
Rails
1.30
iak
1.30
Activations Density 0.001%