INDEX
Explanations
occurrences of quantifiers and references to groups or quantities
New Auto-Interp
Negative Logits
myſelf
-1.65
itſelf
-1.53
ſind
-1.50
^(@)
-1.46
Monfieur
-1.46
iſt
-1.45
Anſ
-1.43
ſelves
-1.42
―――――
-1.42
дописавши
-1.41
POSITIVE LOGITS
<eos>
0.98
,
0.89
.
0.84
↵
0.83
-
0.83
and
0.83
in
0.82
of
0.82
for
0.79
(
0.75
Activations Density 0.830%