INDEX
    Explanations

    phrases indicating actions that are performed or not performed

    instances of the word "did."

    New Auto-Interp
    Negative Logits
     Methods
    -0.71
    liner
    -0.70
    Tier
    -0.70
    Tai
    -0.69
    washer
    -0.68
    case
    -0.67
    oided
    -0.67
     Handling
    -0.66
    stood
    -0.66
    bent
    -0.66
    POSITIVE LOGITS
    actic
    0.99
    pez
    0.88
    ĸļ
    0.84
     not
    0.82
     confir
    0.81
     indeed
    0.77
    oms
    0.75
    anos
    0.74
    ppel
    0.74
     manage
    0.74
    Act Density 0.079%

    No Known Activations