INDEX
Explanations
patterns or sequences of underscores
New Auto-Interp
Negative Logits
“
-0.69
↵
-0.63
.
-0.63
-0.54
<eos>
-0.52
↵↵
-0.50
</b>
-0.50
(
-0.50
…
-0.50
"
-0.48
POSITIVE LOGITS
tagHelperRunner
1.44
CreateTagHelper
1.43
myſelf
1.34
itſelf
1.31
pleaſure
1.25
RetentionPolicy
1.24
houſe
1.20
+#+
1.20
Efq
1.20
iſt
1.19
Activations Density 0.380%