INDEX
Explanations
repeated or emphasized phrases, particularly those lacking specific content
followed by punctuation or special characters
legal citations and statutes
New Auto-Interp
Negative Logits
}$
-0.52
<eos>
-0.52
</b>
-0.48
`
-0.47
$
-0.47
care
-0.45
-
-0.44
Las
-0.44
2
-0.43
do
-0.43
POSITIVE LOGITS
myſelf
0.93
Efq
0.93
\
0.88
ConstraintMaker
0.86
Houſe
0.84
pleaſure
0.84
Anſ
0.83
InjectAttribute
0.82
Theſe
0.79
ſelf
0.79
Activations Density 0.045%