INDEX
Explanations
references to the word "the," which is primarily used to identify and focus on specific objects, concepts, or entities within a text
New Auto-Interp
Negative Logits
The
-0.88
-0.77
so
-0.75
'
-0.73
"
-0.73
in
-0.70
che
-0.69
part
-0.68
sub
-0.68
…
-0.68
POSITIVE LOGITS
Majefty
1.38
raiſ
1.30
Monfieur
1.29
ſever
1.25
myſelf
1.25
purpoſe
1.24
themſelves
1.23
poffible
1.20
avoient
1.19
Efq
1.19
Activations Density 0.367%