INDEX
Explanations
expressions of satisfaction or gratitude
expressions of happiness or contentment
New Auto-Interp
Negative Logits
Format
-0.69
Downloadha
-0.68
perse
-0.66
ciplinary
-0.65
ults
-0.65
catentry
-0.64
cum
-0.64
ufact
-0.64
uria
-0.64
cano
-0.62
POSITIVE LOGITS
they
0.74
Tid
0.71
we
0.70
you
0.68
tid
0.68
THEY
0.67
fully
0.65
somebody
0.64
nobody
0.63
terday
0.63
Activations Density 0.067%