INDEX
Explanations
phrases related to exaggeration or hyperbole
New Auto-Interp
Negative Logits
ãĤĴè¦ĭãĤĭ
-0.17
â̦"↵↵
-0.17
Sesso
-0.16
ffect
-0.15
.generated
-0.14
eparator
-0.14
ãģ£ãģį
-0.14
.xmlbeans
-0.14
â̦”↵↵
-0.14
Ïħμ
-0.14
POSITIVE LOGITS
anyone
0.40
?
0.38
anybody
0.33
FT
0.30
Anyone
0.28
is
0.28
indeed
0.25
perhaps
0.25
yes
0.25
!
0.25
Activations Density 0.810%