INDEX
    Explanations

    starts sentences with pronouns

    New Auto-Interp
    Negative Logits
     ensures
    0.24
     modify
    0.24
     represents
    0.23
     designates
    0.23
     modifies
    0.23
     initialize
    0.23
     initializes
    0.23
     initialized
    0.22
    \
    0.22
     determines
    0.21
    POSITIVE LOGITS
    They
    0.26
    there
    0.23
    they
    0.23
    <unused1810>
    0.23
    <unused2017>
    0.22
    <unused370>
    0.22
    <unused279>
    0.21
    <unused582>
    0.21
    <unused541>
    0.21
    <unused291>
    0.21
    Act Density 0.363%

    No Known Activations