INDEX
Explanations
repeated structure, possibly focusing on the article "the"
New Auto-Interp
Negative Logits
myſelf
-1.48
Efq
-1.47
Majefty
-1.45
purpoſe
-1.35
auroit
-1.34
Monfieur
-1.34
ainfi
-1.33
auffi
-1.32
avoient
-1.29
Jefus
-1.27
POSITIVE LOGITS
The
2.08
The
1.48
THE
1.38
THE
1.30
the
0.90
A
0.87
0.81
These
0.80
An
0.79
This
0.73
Activations Density 0.559%