INDEX
Explanations
sequences of underscores or repeated special characters
New Auto-Interp
Negative Logits
<tr>
-0.86
.
-0.83
it
-0.66
</tr>
-0.65
a
-0.65
[toxicity=0]
-0.64
)
-0.64
the
-0.63
The
-0.63
I
-0.62
POSITIVE LOGITS
+#+
1.44
Efq
1.24
Reſ
1.23
Shakspeare
1.22
Jefus
1.21
Majefty
1.17
Monfieur
1.15
Chriftian
1.14
Datuak
1.14
itſelf
1.14
Activations Density 0.908%