INDEX
Explanations
statements or phrases that express personal feelings and experiences related to decision-making or reflection
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.07
3:0.15
4:0.03
5:0.06
6:0.02
7:0.05
8:0.02
9:0.02
10:0.43
11:0.02
Negative Logits
rehensive
-2.10
cern
-2.04
uggest
-2.03
rief
-1.97
estic
-1.96
ounces
-1.96
YN
-1.93
obbies
-1.89
oller
-1.88
ATT
-1.88
POSITIVE LOGITS
deserved
3.00
lucky
2.95
deserving
2.81
deserve
2.77
better
2.56
correctly
2.50
ripe
2.48
perfect
2.44
advant
2.43
entit
2.41
Activations Density 0.255%