INDEX
Explanations
quotations, particularly at the ends of sentences
punctuation
New Auto-Interp
Negative Logits
(
-0.59
M
-0.57
u
-0.56
H
-0.55
L
-0.52
lis
-0.52
z
-0.52
########.
-0.51
-
-0.51
za
-0.50
POSITIVE LOGITS
}}$}
1.13
"}>
1.13
()");
1.13
}");
1.10
myſelf
1.09
itſelf
1.08
"]:
1.07
"});
1.06
?");
1.06
%");
1.05
Activations Density 1.243%