INDEX
Explanations
references to scientific methods and findings in research
New Auto-Interp
Negative Logits
purpoſe
-1.15
itſelf
-1.11
myſelf
-1.05
ſeveral
-1.02
ſtate
-1.01
Majefty
-1.01
fubject
-1.01
pleaſure
-1.01
Efq
-0.99
Reſ
-0.99
POSITIVE LOGITS
the
1.03
0.76
a
0.74
The
0.68
an
0.63
Par
0.59
↵↵
0.59
:
0.59
(
0.58
,
0.55
Activations Density 0.613%