INDEX
Explanations
geographical locations and historical landmarks
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.16
3:0.09
4:0.24
5:0.04
6:0.06
7:0.11
8:0.04
9:0.04
10:0.07
11:0.06
Negative Logits
seless
-1.79
oyal
-1.53
remorse
-1.52
++)
-1.49
compliant
-1.46
inkle
-1.45
indisp
-1.42
impartial
-1.39
guilt
-1.38
dissenting
-1.35
POSITIVE LOGITS
grab
1.73
cair
1.69
yip
1.64
ewater
1.61
atown
1.56
airst
1.53
ansas
1.52
Publishers
1.52
orks
1.51
erenn
1.47
Activations Density 0.002%