INDEX
Explanations
the presence of specific formatting or symbols, particularly whitespace or empty characters, in the text
New Auto-Interp
Negative Logits
themſelves
-0.90
himſelf
-0.84
myſelf
-0.80
itſelf
-0.79
springfox
-0.78
Shakspeare
-0.73
fromCharCode
-0.68
fubject
-0.66
pleaſure
-0.66
reaſon
-0.65
POSITIVE LOGITS
the
1.10
<eos>
0.76
the
0.75
The
0.70
THE
0.61
ScopeManager
0.61
WindowConstants
0.59
its
0.58
that
0.57
our
0.56
Activations Density 0.262%