INDEX
Explanations
titles or positions within a hierarchical context
New Auto-Interp
Negative Logits
myſelf
-1.10
houſe
-1.09
itſelf
-1.07
Theſe
-1.06
pleaſure
-1.05
ſche
-1.03
Jefus
-0.99
Houſe
-0.98
greateſt
-0.96
Majefty
-0.95
POSITIVE LOGITS
or
0.58
last
0.55
<eos>
0.55
,
0.54
$\
0.53
0.51
.
0.51
J
0.50
in
0.48
O
0.48
Activations Density 0.471%