INDEX
Explanations
specific mentions of the word "The"
repetitions of the word "the."
New Auto-Interp
Negative Logits
pers
-0.84
tec
-0.68
.","
-0.68
blem
-0.67
soever
-0.66
ée
-0.66
ben
-0.66
wart
-0.65
ecided
-0.65
won
-0.65
POSITIVE LOGITS
latter
1.01
aforementioned
0.98
foregoing
0.94
latest
0.93
simplest
0.93
same
0.91
emergence
0.90
oret
0.89
largest
0.84
aftermath
0.83
Activations Density 0.437%