INDEX
Explanations
expressions of excitement or enthusiasm
New Auto-Interp
Negative Logits
=
-0.91
]-->
-0.74
;-)
-0.73
;-)
-0.72
:-)
-0.69
Arxivat
-0.67
-0.65
AnchorStyles
-0.65
CHtml
-0.63
(^_^;)
-0.62
POSITIVE LOGITS
idk
0.87
🥺
0.84
tbh
0.83
ngl
0.83
Idk
0.81
🥲
0.77
lmao
0.76
Idk
0.75
🥺
0.74
😭
0.72
Activations Density 0.122%