INDEX
Explanations
sections of text that appear to be headings or titles within a document
New Auto-Interp
Negative Logits
</em>
-0.56
,
-0.53
.
-0.53
</i>
-0.51
(
-0.50
[
-0.50
o
-0.48
s
-0.47
–
-0.47
on
-0.46
POSITIVE LOGITS
Jefus
1.13
pleaſure
1.08
tvguidetime
1.06
greateſt
1.05
Efq
1.05
raiſ
1.03
houſe
0.99
Theſe
0.99
myſelf
0.96
ſelf
0.96
Activations Density 0.019%