INDEX
Explanations
objects related to compliments or positive feedback
expressions of happiness or positive emotions
New Auto-Interp
Negative Logits
pse
-0.67
amental
-0.66
Inqu
-0.61
calculating
-0.60
isers
-0.57
eem
-0.56
EPA
-0.55
execute
-0.55
uay
-0.55
orem
-0.55
POSITIVE LOGITS
agos
0.80
clus
0.78
Congratulations
0.65
clusive
0.62
finally
0.62
DN
0.61
ped
0.61
:-)
0.60
:)
0.60
!]
0.60
Activations Density 0.546%