INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Head Attr Weights
0:0.15
1:0.17
2:0.06
3:0.07
4:0.02
5:0.02
6:0.09
7:0.10
8:0.03
9:0.03
10:0.17
11:0.03
Negative Logits
Lear
-3.73
Hearth
-3.50
Granger
-3.03
Bron
-3.02
intrins
-2.76
Latinos
-2.71
loops
-2.65
Malone
-2.62
LAT
-2.62
Bees
-2.61
POSITIVE LOGITS
Fuk
8.03
uk
4.77
Muk
4.38
Ruk
4.13
Yuk
3.85
Luk
3.79
Suk
3.41
uku
3.40
Kum
3.30
Sapp
3.28
Activations Density 0.001%