INDEX
Explanations
instances of punctuation and separators in a textual context
punctuation and following words
New Auto-Interp
Negative Logits
dead
-0.37
tra
-0.34
sur
-0.34
mad
-0.33
contra
-0.32
Rol
-0.32
Scar
-0.31
MAD
-0.31
Mada
-0.30
Tam
-0.30
POSITIVE LOGITS
UnusedPrivate
0.78
للمعارف
0.77
Diweddarwch
0.75
AndEndTag
0.73
RegressionTest
0.72
Personensuche
0.66
<unused52>
0.64
<unused41>
0.63
<unused28>
0.63
<unused3>
0.63
Activations Density 0.224%