INDEX
Explanations
instances of names associated with political or legal figures
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.09
3:0.26
4:0.02
5:0.02
6:0.08
7:0.10
8:0.04
9:0.12
10:0.05
11:0.10
Negative Logits
PLIED
-1.42
IMAGES
-1.13
rentice
-1.10
CONTIN
-1.08
renheit
-1.01
isites
-1.00
Applied
-0.98
[+
-0.97
explanations
-0.97
EGIN
-0.97
POSITIVE LOGITS
ansk
1.40
achev
1.26
uv
1.21
oslav
1.19
namese
1.18
quez
1.18
tein
1.17
ibrary
1.13
zbek
1.13
te
1.11
Activations Density 0.002%