INDEX
Explanations
expressions related to emotions, especially deep feelings like gratitude, loss, and determination
phrases connected to personal emotions and introspection
New Auto-Interp
Negative Logits
iste
-0.65
GOODMAN
-0.59
bley
-0.53
inth
-0.52
semb
-0.50
nia
-0.50
WATCHED
-0.50
ANS
-0.49
going
-0.49
ymm
-0.49
POSITIVE LOGITS
your
1.14
their
1.14
his
1.14
his
1.13
YOUR
1.09
their
1.08
your
1.07
my
1.02
THEIR
0.99
HIS
0.96
Activations Density 0.691%