INDEX
Explanations
markers of text structure or segmentation
New Auto-Interp
Negative Logits
</em>
-0.78
</td>
-0.76
</h6>
-0.76
</blockquote>
-0.76
</code>
-0.72
</i>
-0.71
</s>
-0.70
</h5>
-0.70
…
-0.70
</strong>
-0.67
POSITIVE LOGITS
deſt
0.92
feroit
0.89
pleaſure
0.87
fubject
0.87
auroit
0.86
poffible
0.85
beſt
0.83
becauſe
0.83
myſelf
0.82
Monfieur
0.82
Activations Density 0.267%