INDEX
Explanations
expressions of personal opinions and emotional responses to experiences or preferences
New Auto-Interp
Negative Logits
елеÑĦ
-0.16
åĿĬ
-0.14
лÑĥÑĩ
-0.14
NAL
-0.14
idon
-0.13
->{_-0.13
FML
-0.13
immel
-0.13
вÑģÑı
-0.13
ãĤ¯ãĥĪ
-0.13
POSITIVE LOGITS
enjoy
0.54
likes
0.51
enjoys
0.50
enjoying
0.49
love
0.48
liked
0.47
enjoyed
0.47
Enjoy
0.47
liking
0.46
loves
0.42
Activations Density 0.491%