INDEX
Explanations
elements of formatting and structure in text
New Auto-Interp
Negative Logits
Efq
-1.00
NUMX
-0.99
Jefus
-0.96
pleaſure
-0.94
Theſe
-0.94
ſelf
-0.93
ſelves
-0.93
出版年
-0.92
itſelf
-0.92
ConstraintMaker
-0.92
POSITIVE LOGITS
<eos>
0.96
↵
0.91
↵↵
0.77
↵↵↵
0.68
↵↵↵↵
0.63
↵↵↵↵↵
0.52
.
0.45
:
0.45
ud
0.44
↵↵↵↵↵↵
0.43
Activations Density 0.803%