INDEX
Explanations
brackets and nested structures in mathematical expressions
New Auto-Interp
Negative Logits
itſelf
-1.17
pleaſure
-1.13
myſelf
-1.11
ſelves
-1.05
Jefus
-1.05
preſent
-1.05
Reſ
-1.05
raiſ
-1.05
Majefty
-1.05
ſtate
-1.03
POSITIVE LOGITS
{1.08
‘
0.83
“
0.82
“
0.72
(
0.72
/
0.62
__["
0.61
[]{0.61
[toxicity=0]
0.60
‘
0.60
Activations Density 0.121%