INDEX
    Explanations

    actions or tasks being done

    instances of the phrase "do it."

    New Auto-Interp
    Negative Logits
     Opposition
    -0.68
     Flavoring
    -0.67
     Returning
    -0.63
    ²
    -0.60
     Represent
    -0.60
     Ware
    -0.58
     opinions
    -0.58
     advisors
    -0.57
    ONSORED
    -0.56
     Arm
    -0.56
    POSITIVE LOGITS
    alian
    1.09
     justice
    0.83
    alia
    0.81
    chy
    0.81
    wrong
    0.81
    pez
    0.79
    self
    0.79
    lez
    0.77
     differently
    0.76
    unes
    0.76
    Act Density 0.049%

    No Known Activations