INDEX
Explanations
phrases related to praising or approval
New Auto-Interp
Negative Logits
ther
-0.77
soDeliveryDate
-0.75
ueller
-0.74
ramid
-0.74
abouts
-0.72
itamin
-0.70
bang
-0.70
itol
-0.66
claimer
-0.66
few
-0.66
POSITIVE LOGITS
ifully
0.94
eous
0.80
virtues
0.78
iful
0.72
bravery
0.69
fully
0.67
exemplary
0.67
ously
0.66
atory
0.65
rous
0.65
Activations Density 0.107%