INDEX
    Explanations

    future actions or decisions

    instances of the word "do" and its variations, indicating inquiries or commands about actions

    New Auto-Interp
    Negative Logits
     Entered
    -0.98
     Frie
    -0.72
    gart
    -0.70
    theless
    -0.67
    sent
    -0.65
    tro
    -0.63
    Ĭ±
    -0.63
    inently
    -0.62
    printed
    -0.61
    ware
    -0.61
    POSITIVE LOGITS
    pez
    1.09
    atives
    0.74
    berman
    0.70
    etting
    0.68
     wrong
    0.66
    ggy
    0.65
    INGS
    0.65
     differently
    0.65
    ":"
    0.64
    ients
    0.63
    Act Density 0.063%

    No Known Activations