INDEX
Explanations
mention of "the" and variations of "The" in the text
New Auto-Interp
Negative Logits
ther
-0.17
ightly
-0.17
Fus
-0.16
icum
-0.16
mer
-0.15
ns
-0.15
actly
-0.15
fus
-0.15
sec
-0.14
structions
-0.14
POSITIVE LOGITS
oretical
0.25
odore
0.18
orem
0.17
oret
0.17
Ä¢
0.16
atre
0.16
viso
0.16
ostel
0.15
ERSHEY
0.15
tul
0.15
Activations Density 0.301%