INDEX
Explanations
phrases that indicate uncertainty or speculation
New Auto-Interp
Negative Logits
sÄħ
-0.15
ä¸Ģæł·
-0.14
339
-0.14
@js
-0.14
ä¸Ģèµ·
-0.14
á¿¶
-0.14
same
-0.13
Same
-0.13
è»
-0.13
sla
-0.13
POSITIVE LOGITS
nobody
0.35
few
0.29
none
0.29
everyone
0.26
many
0.23
everybody
0.22
no
0.22
anyone
0.21
def
0.21
NONE
0.21
Activations Density 0.215%