INDEX
Explanations
references to specific pairs or categories of information
New Auto-Interp
Negative Logits
the
-0.48
T
-0.48
/
-0.47
and
-0.47
(
-0.47
-0.45
,
-0.44
<i>
-0.43
2
-0.43
-0.42
POSITIVE LOGITS
queſta
0.99
avoient
0.91
plufieurs
0.85
approximate
0.85
ſammen
0.85
nôtre
0.84
Monfieur
0.84
accurate
0.84
<unused43>
0.83
zwiſchen
0.83
Activations Density 0.362%