INDEX
Explanations
mentions of specific names and titles, particularly related to sports journalism and writing
phrases with repeated characters or symbols
New Auto-Interp
Negative Logits
hift
-0.71
omm
-0.70
ixel
-0.68
paces
-0.67
zac
-0.67
izzard
-0.64
omething
-0.64
tery
-0.61
ARB
-0.60
creen
-0.60
POSITIVE LOGITS
--------------------
0.80
=-=-
0.78
Prev
0.76
=-
0.75
-.
0.74
=~=~
0.74
_-
0.71
---------
0.71
denotes
0.70
âĸł
0.69
Activations Density 0.042%