INDEX
Explanations
proper nouns, particularly names of individuals
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.04
3:0.05
4:0.05
5:0.04
6:0.41
7:0.08
8:0.05
9:0.06
10:0.06
11:0.05
Negative Logits
��
-1.39
�
-1.35
Struggle
-1.29
ModLoader
-1.25
��
-1.22
largeDownload
-1.21
>>\
-1.21
mma
-1.20
irez
-1.20
\\\\
-1.16
POSITIVE LOGITS
rison
1.55
ican
1.42
iewicz
1.40
umatic
1.39
zon
1.38
igon
1.31
opol
1.30
witz
1.29
linger
1.28
acher
1.27
Activations Density 0.004%