INDEX
Explanations
phrases that indicate statistics or numerical values in discussions
New Auto-Interp
Negative Logits
↵
-0.40
(
-0.27
's
-0.26
 
-0.25
:
-0.25
"
-0.24
're
-0.23
’s
-0.23
<br
-0.23
↵↵
-0.22
POSITIVE LOGITS
/'
0.21
â̲
0.16
ãĢģ“
0.15
ï¼Į“
0.15
ãĢģãĢĮ
0.15
[$_
0.14
ï¸ı
0.14
ulses
0.14
egal
0.14
®,
0.14
Activations Density 1.098%