INDEX
Explanations
expressions of complaint or dissatisfaction
New Auto-Interp
Negative Logits
$}}
-0.63
']}
-0.58
]})
-0.58
))}
-0.58
.)}
-0.57
}}}}
-0.55
]}>
-0.54
devamını
-0.53
'})
-0.52
"]}
-0.51
POSITIVE LOGITS
whining
0.98
whine
0.93
spoiled
0.87
complaining
0.79
spoilt
0.74
complains
0.69
entitlement
0.68
narciss
0.67
complained
0.66
complain
0.66
Activations Density 0.204%