INDEX
Explanations
starts sentences with pronouns
New Auto-Interp
Negative Logits
ensures
0.24
modify
0.24
represents
0.23
designates
0.23
modifies
0.23
initialize
0.23
initializes
0.23
initialized
0.22
\
0.22
determines
0.21
POSITIVE LOGITS
They
0.26
there
0.23
they
0.23
<unused1810>
0.23
<unused2017>
0.22
<unused370>
0.22
<unused279>
0.21
<unused582>
0.21
<unused541>
0.21
<unused291>
0.21
Activations Density 0.363%