INDEX
Explanations
exclamatory expressions that convey strong emotions or reactions
New Auto-Interp
Negative Logits
ica
-0.18
anc
-0.15
ICA
-0.15
roje
-0.15
ixa
-0.14
bot
-0.14
cki
-0.14
ownt
-0.14
thed
-0.14
ä¸įå¾Ĺ
-0.14
POSITIVE LOGITS
!!.
0.26
!↵
0.23
!!!!↵↵
0.23
111
0.21
!!!
0.21
!↵↵
0.20
[](
0.20
!"
0.18
!!!↵↵
0.18
!!↵
0.18
Activations Density 0.015%