INDEX
Explanations
the word "the" across various contexts in the text
New Auto-Interp
Negative Logits
Efq
-1.52
Theſe
-1.39
itſelf
-1.39
Monfieur
-1.35
ſeveral
-1.34
myſelf
-1.32
iſt
-1.31
whoſe
-1.31
himſelf
-1.30
étoient
-1.30
POSITIVE LOGITS
of
1.36
du
0.86
(
0.79
0.79
del
0.79
des
0.78
-
0.77
of
0.77
↵↵
0.73
.
0.72
Activations Density 0.629%