INDEX
Explanations
sections of text that focus on theoretical or speculative frameworks in various fields
New Auto-Interp
Negative Logits
itſelf
-1.15
Majefty
-1.07
iſt
-1.06
themſelves
-1.05
ſelves
-1.04
myſelf
-1.04
Monfieur
-1.01
ſelf
-1.01
ſy
-0.99
Jefus
-0.99
POSITIVE LOGITS
,
0.90
.
0.82
p
0.69
0.65
</strong>
0.65
↵
0.65
<strong>
0.65
to
0.65
l
0.64
in
0.64
Activations Density 0.132%