INDEX
Explanations
specific formatting elements and punctuation in the text
New Auto-Interp
Negative Logits
åİ
-0.19
تÙĨ
-0.17
urette
-0.16
İ
-0.16
Erk
-0.16
ãĥĪ
-0.15
ushima
-0.15
edin
-0.15
ãĥĪ
-0.15
.Std
-0.14
POSITIVE LOGITS
ivate
0.18
ás
0.16
اÙĦÙĬÙħÙĨ
0.16
ias
0.15
Pam
0.15
ante
0.15
heimer
0.15
AS
0.15
èĩ¨
0.14
-as
0.14
Activations Density 0.065%