INDEX
Explanations
beginning markers in written content
New Auto-Interp
Negative Logits
de
-0.98
del
-0.95
sa
-0.92
A
-0.91
int
-0.90
is
-0.90
in
-0.89
et
-0.88
I
-0.87
y
-0.86
POSITIVE LOGITS
itſelf
1.63
doubtnut
1.51
myſelf
1.51
pleaſure
1.45
Anſ
1.40
uſed
1.38
་་
1.35
unſ
1.34
ſelf
1.33
poffible
1.33
Activations Density 0.151%