INDEX
Explanations
references to groups or categories of people
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.06
3:0.05
4:0.07
5:0.05
6:0.48
7:0.02
8:0.05
9:0.03
10:0.07
11:0.04
Negative Logits
iod
-1.37
fuse
-1.26
reuse
-1.25
ovsky
-1.24
translation
-1.24
chieve
-1.19
draw
-1.12
texture
-1.11
supervision
-1.11
epis
-1.10
POSITIVE LOGITS
iris
1.50
swayed
1.48
violated
1.42
esta
1.42
oi
1.37
oths
1.37
ppers
1.35
��
1.31
swick
1.31
upper
1.30
Activations Density 0.064%