INDEX
Explanations
proper nouns and names, primarily related to people
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.05
3:0.05
4:0.04
5:0.04
6:0.42
7:0.07
8:0.06
9:0.07
10:0.06
11:0.05
Negative Logits
VICE
-1.22
TYPE
-1.19
lift
-1.17
pection
-1.14
lift
-1.12
PLIED
-1.12
iT
-1.11
gem
-1.10
guide
-1.10
exceptions
-1.09
POSITIVE LOGITS
ioxide
1.47
ヘラ
1.46
ciating
1.43
apons
1.38
iewicz
1.36
opolis
1.34
��
1.33
thia
1.33
emort
1.32
inous
1.30
Activations Density 0.003%