INDEX
Explanations
words and phrases associated with excuses and rationalizations
New Auto-Interp
Head Attr Weights
0:0.08
1:0.03
2:0.30
3:0.07
4:0.16
5:0.05
6:0.02
7:0.02
8:0.06
9:0.08
10:0.05
11:0.03
Negative Logits
abase
-1.26
kit
-1.22
imir
-1.20
artisan
-1.17
yip
-1.16
ynt
-1.14
fram
-1.14
center
-1.13
endif
-1.12
arten
-1.12
POSITIVE LOGITS
iencies
1.40
Args
1.28
FACE
1.24
�士
1.18
ttes
1.17
MENTS
1.15
stim
1.14
usions
1.13
�
1.11
Sad
1.09
Activations Density 0.004%