INDEX
Explanations
words appearing in a historical play script
textual references related to books or titles
Single characters
New Auto-Interp
Negative Logits
itſelf
-1.55
raiſ
-1.39
Theſe
-1.35
myſelf
-1.34
ſelves
-1.31
ſelf
-1.30
ſever
-1.29
themſelves
-1.28
auffi
-1.26
ſtate
-1.26
POSITIVE LOGITS
t
0.69
0.67
,
0.64
(
0.63
da
0.61
G
0.60
g
0.59
T
0.57
as
0.57
<eos>
0.56
Activations Density 3.664%