INDEX
Explanations
references to specific individuals and their actions or attributes
New Auto-Interp
Head Attr Weights
0:0.05
1:0.03
2:0.31
3:0.04
4:0.10
5:0.07
6:0.03
7:0.03
8:0.08
9:0.15
10:0.05
11:0.02
Negative Logits
ikuman
-1.31
ormal
-1.31
oise
-1.27
ICAN
-1.25
onen
-1.25
�
-1.24
izz
-1.24
kov
-1.24
anche
-1.22
ocking
-1.19
POSITIVE LOGITS
resil
1.41
nesday
1.41
ktop
1.39
ankind
1.31
ontent
1.26
poles
1.17
ibaba
1.12
bane
1.08
Faust
1.08
shakes
1.07
Activations Density 0.006%