INDEX
    Explanations

    requests or statements followed by a specific action or instruction

    instances of the word "Please," indicating a request or instruction

    New Auto-Interp
    Negative Logits
    ounter
    -0.70
     attributes
    -0.64
     early
    -0.64
     shifting
    -0.64
     secret
    -0.63
     diss
    -0.63
     marvel
    -0.62
     prototyp
    -0.62
     skirm
    -0.62
     latent
    -0.61
    POSITIVE LOGITS
     Please
    3.25
    Please
    2.34
     PLEASE
    2.26
    please
    2.13
     please
    2.05
     Sorry
    1.43
     Thank
    1.41
    PLE
    1.25
     Feel
    1.24
     Therefore
    1.22
    Act Density 0.023%

    No Known Activations