INDEX
Explanations
expressions of strong emotions or reactions
New Auto-Interp
Negative Logits
↵
-0.23
/or
-0.17
ï¼īãģ¯
-0.16
.]↵↵
-0.16
_B
-0.16
_S
-0.16
-S
-0.15
|x
-0.15
_Syntax
-0.15
-B
-0.15
POSITIVE LOGITS
11
0.33
111
0.31
1
0.31
!(
0.25
(
0.22
’
0.22
<
0.21
10
0.21
!--
0.21
12
0.20
Activations Density 0.013%