INDEX
Explanations
expressions of frustration and commentary related to specific tasks or systems
ending in "self" or "selves"
myself, oneself, himself
New Auto-Interp
Negative Logits
”.
-0.64
)”.
-0.61
).[
-0.59
“.
-0.58
=".
-0.57
,“
-0.56
”).
-0.56
”.
-0.56
!】
-0.56
ofluor
-0.55
POSITIVE LOGITS
myſelf
1.01
itſelf
0.88
tbh
0.86
myself
0.84
himſelf
0.82
themſelves
0.78
ſelf
0.77
pleaſure
0.76
haha
0.75
hehe
0.71
Activations Density 0.631%