INDEX
Explanations
text related to articles or programming instructions
the presence of articles and other grammatical elements in text
New Auto-Interp
Negative Logits
''.
-0.92
``
-0.86
.}
-0.78
"},
-0.74
cffff
-0.72
.''.
-0.71
});
-0.71
.''
-0.70
mathemat
-0.69
.",
-0.69
POSITIVE LOGITS
–
1.48
-
1.25
--
1.14
—
1.10
–
1.06
ãĥ»
0.92
âĢķ
0.88
—
0.88
--
0.84
)—
0.78
Activations Density 0.206%