INDEX
    Explanations

    words related to updating, changing, and influencing

    forms of "to be"

    New Auto-Interp
    Negative Logits
     is
    -0.91
     does
    -0.77
     has
    -0.77
     knows
    -0.72
     goes
    -0.71
     gets
    -0.69
     takes
    -0.66
     realizes
    -0.66
     becomes
    -0.66
     begins
    -0.65
    POSITIVE LOGITS
     were
    1.40
     are
    1.27
     weren
    1.16
     WERE
    1.13
    were
    1.12
     ARE
    1.02
     aren
    0.97
    Were
    0.96
     Were
    0.91
    are
    0.91
    Act Density 4.196%

    No Known Activations