INDEX
Explanations
mathematical or symbolic notation and references to figures or steps in technical descriptions.
scientific figures/sections
New Auto-Interp
Negative Logits
1
-0.85
(
-0.74
4
-0.73
0
-0.72
2
-0.72
<eos>
-0.72
3
-0.71
,
-0.71
-
-0.71
S
-0.71
POSITIVE LOGITS
Efq
1.41
itſelf
1.39
ſeveral
1.38
myſelf
1.38
houſe
1.37
purpoſe
1.36
raiſ
1.34
Theſe
1.30
ſche
1.28
ſelf
1.27
Activations Density 9.983%