INDEX
Explanations
references to hyperbolic expressions and exaggerations
New Auto-Interp
Negative Logits
ãĤĴè¦ĭãĤĭ
-0.15
â̦"↵↵
-0.15
Sesso
-0.14
eparator
-0.14
UDENT
-0.14
ÅŁeyi
-0.13
oplan
-0.13
ึà¹Ī
-0.13
StringComparison
-0.13
ffect
-0.13
POSITIVE LOGITS
anyone
0.43
anybody
0.37
FT
0.34
?
0.33
Anyone
0.31
alert
0.29
indeed
0.28
Anyone
0.27
included
0.25
ft
0.24
Activations Density 0.541%