INDEX
Explanations
elements related to identity or personal attributes
Follows dialogue or a question
Well, Okay, Yes, judge, Prior
New Auto-Interp
Negative Logits
itſelf
-1.09
()?;
-0.99
.";
-0.98
فريبيس
-0.96
=?";
-0.94
)";
-0.94
ſind
-0.93
✨:
-0.92
".
-0.91
%";
-0.90
POSITIVE LOGITS
I
0.84
you
0.66
!
0.64
.
0.63
[
0.62
I
0.61
,
0.60
he
0.59
(
0.58
because
0.53
Activations Density 0.155%