INDEX
Explanations
the formatting of bullet points or lists in written content
emoticons or punctuation
New Auto-Interp
Negative Logits
gestone
-0.75
تضيفلها
-0.73
Eilish
-0.72
arşivlendi
-0.69
cticut
-0.69
httphttps
-0.68
increí
-0.66
majánló
-0.66
存于互联网档案馆
-0.66
Grüsse
-0.65
POSITIVE LOGITS
:-
0.91
{-0.79
:-
0.77
;-
0.75
.-
0.75
(-
0.75
=-
0.74
-
0.72
!-
0.72
,-
0.71
Activations Density 0.043%