INDEX
Explanations
expressions related to cultural or ethnic identity
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.07
3:0.05
4:0.04
5:0.04
6:0.45
7:0.04
8:0.03
9:0.06
10:0.07
11:0.05
Negative Logits
-+-+
-1.49
NECT
-1.31
pse
-1.23
vigilance
-1.20
olesterol
-1.18
ICAN
-1.16
ATING
-1.14
vertisements
-1.14
Services
-1.13
estic
-1.13
POSITIVE LOGITS
�
1.75
Rebels
1.36
bye
1.35
rette
1.28
ghan
1.25
emp
1.20
forgiven
1.18
quez
1.17
undle
1.15
ndra
1.15
Activations Density 0.001%