INDEX
Explanations
sentences that express opinions or reflections on societal issues
New Auto-Interp
Negative Logits
=$?
-1.04
oprot
-0.93
pleaſure
-0.92
ſelf
-0.87
myſelf
-0.86
theless
-0.86
ProtoMessage
-0.86
🏻♀️
-0.83
yntaxException
-0.82
Majefty
-0.82
POSITIVE LOGITS
n
0.64
I
0.63
<
0.57
what
0.57
I
0.55
(
0.55
↵↵
0.55
And
0.54
And
0.54
What
0.54
Activations Density 0.287%