INDEX
Explanations
phrases indicating evaluation or judgment
phrases related to conditions and limitations involving potential actions or guarantees
New Auto-Interp
Negative Logits
�
-0.79
,)
-0.73
.),
-0.72
ãĢĮ
-0.70
.)
-0.67
.):
-0.66
?),
-0.65
?,
-0.63
§§
-0.62
----------------
-0.61
POSITIVE LOGITS
%"
1.41
"
1.31
usterity
1.23
".[
1.19
".
1.13
"?
1.13
'"
1.09
"—
1.06
"!
1.06
"[
1.04
Activations Density 0.353%