INDEX
Explanations
proper nouns, particularly names of individuals and brands
New Auto-Interp
Head Attr Weights
0:0.05
1:0.09
2:0.13
3:0.03
4:0.02
5:0.03
6:0.05
7:0.02
8:0.02
9:0.03
10:0.44
11:0.02
Negative Logits
Cth
-2.38
Boko
-2.10
independence
-2.08
Anglic
-2.01
millenn
-1.97
Odin
-1.95
Unity
-1.93
Uri
-1.89
Thom
-1.88
Asc
-1.86
POSITIVE LOGITS
z
4.39
zos
3.63
zes
3.32
ez
3.19
zed
3.17
zek
3.12
zik
3.09
zon
3.09
zie
3.08
zz
3.07
Activations Density 0.006%