INDEX
Explanations
parentheses and related punctuation in text
New Auto-Interp
Negative Logits
ویکیپدیا
-0.82
Skocz
-0.64
referenties
-0.64
wij
-0.60
depend
-0.60
"
-0.59
teto
-0.59
<blockquote>
-0.59
idxs
-0.59
tsz
-0.58
POSITIVE LOGITS
(
1.50
(
1.33
”(
1.14
』(
1.11
》(
1.08
)(
1.06
!(
1.05
!(
1.04
?(
1.03
」(
0.98
Activations Density 0.035%