INDEX
Explanations
instances of the start of textual segments or paragraphs
New Auto-Interp
Negative Logits
.
-0.70
>();
-0.69
');
-0.67
>();
-0.65
);
-0.65
be
-0.63
;
-0.62
also
-0.61
…
-0.58
");
-0.58
POSITIVE LOGITS
Jefus
0.96
Shakspeare
0.92
himſelf
0.92
itſelf
0.86
myſelf
0.84
Monfieur
0.83
<bos>
0.81
theſe
0.78
Theſe
0.78
Eſ
0.76
Activations Density 0.864%