INDEX
Explanations
specific symbols or formatting elements in the text
New Auto-Interp
Negative Logits
↵↵
-0.68
-
-0.57
tr
-0.54
-
-0.52
y
-0.52
:
-0.49
...
-0.48
Desde
-0.48
find
-0.47
us
-0.47
POSITIVE LOGITS
ibr
1.30
purpoſe
1.23
myſelf
1.21
reaſon
1.14
houſe
1.14
Majefty
1.14
ſeveral
1.13
pleaſure
1.13
raiſ
1.11
Anſ
1.10
Activations Density 0.000%