INDEX
Explanations
punctuation marks and quotation characters
New Auto-Interp
Negative Logits
,
-0.82
and
-0.80
-
-0.69
-
-0.67
-0.67
&
-0.63
.
-0.62
/
-0.61
–
-0.59
com
-0.57
POSITIVE LOGITS
".
1.30
]";
1.23
.";
1.17
")));
1.16
%";
1.16
.",
1.15
"]));
1.10
),"
1.10
],"
1.09
"]:
1.07
Activations Density 0.755%