INDEX
Explanations
instances of the word "it" and variations in capitalization
New Auto-Interp
Negative Logits
from
-0.21
on
-0.20
with
-0.19
which
-0.19
have
-0.19
during
-0.17
throughout
-0.17
within
-0.17
hip
-0.16
-on
-0.16
POSITIVE LOGITS
'll
0.48
's
0.46
iner
0.46
'd
0.41
’ll
0.39
’s
0.37
chy
0.36
’d
0.33
inerary
0.32
've
0.31
Activations Density 0.399%