INDEX
Explanations
expressions of difficulty or trouble
New Auto-Interp
Negative Logits
adj
-0.66
itsch
-0.65
aware
-0.65
interstitial
-0.64
merce
-0.63
yond
-0.63
arning
-0.63
estones
-0.63
Posts
-0.62
UC
-0.61
POSITIVE LOGITS
conversations
0.95
luck
0.89
dealings
0.83
conversation
0.83
intercourse
0.82
discussions
0.80
fun
0.79
chats
0.77
Thanksgiving
0.77
haircut
0.74
Activations Density 0.146%