INDEX
    Explanations

    patterns matching the structure 'X after Y'

    elements related to transitions or changes in states

    New Auto-Interp
    Negative Logits
    arb
    -0.70
    utable
    -0.70
    ortion
    -0.67
    fit
    -0.60
    irable
    -0.60
     metic
    -0.59
    agon
    -0.58
    UX
    -0.57
     Kits
    -0.57
    ITE
    -0.57
    POSITIVE LOGITS
     AFTER
    1.60
    before
    1.58
     before
    1.57
     after
    1.51
    after
    1.51
     BEFORE
    1.48
     afterward
    1.41
     afterwards
    1.36
     Before
    1.26
     After
    1.25
    Act Density 0.204%

    No Known Activations