INDEX
Explanations
instances of laughter and amusement in conversations
New Auto-Interp
Negative Logits
ainment
-0.17
adge
-0.16
acie
-0.15
aney
-0.15
ipse
-0.15
anner
-0.15
revers
-0.14
itesse
-0.14
otch
-0.14
enaire
-0.14
POSITIVE LOGITS
ingly
0.24
stocks
0.22
harder
0.22
upro
0.22
heart
0.20
ably
0.20
upro
0.20
hardest
0.19
uncont
0.19
stock
0.19
Activations Density 0.033%