INDEX
Explanations
promotional language related to offers and deals
New Auto-Interp
Negative Logits
...↵↵
-0.23
)--
-0.21
“↵↵
-0.21
[--
-0.20
.....
-0.20
....↵↵
-0.20
~
-0.20
's
-0.20
..↵↵
-0.20
â̦↵↵
-0.19
POSITIVE LOGITS
&apos
0.34
»
0.29
«
0.28
apos
0.25
.»
0.23
».
0.21
\'
0.20
'''
0.20
»
0.20
’
0.20
Activations Density 0.017%