INDEX
Explanations
phrases or statements within quotation marks
quotations and dialogue
New Auto-Interp
Negative Logits
,
-0.68
)=
-0.65
Ͻ
-0.61
.
-0.60
ulic
-0.60
)
-0.59
itan
-0.57
)/
-0.55
paren
-0.55
)\
-0.55
POSITIVE LOGITS
/"
0.99
]:
0.58
--
0.57
sic
0.56
namely
0.56
["
0.55
that
0.54
because
0.53
referring
0.53
—
0.53
Activations Density 0.111%