INDEX
    Explanations

    phrases related to requests and commands

    New Auto-Interp
    Negative Logits
     Dün
    -0.16
    lyn
    -0.16
    ogo
    -0.15
    osp
    -0.14
    neau
    -0.14
    tram
    -0.14
    .bc
    -0.14
    strup
    -0.14
    ontent
    -0.14
     ourselves
    -0.13
    POSITIVE LOGITS
     please
    0.30
    please
    0.26
     Please
    0.21
    Please
    0.21
     PLEASE
    0.21
     bitte
    0.19
    请
    0.18
     ple
    0.18
    ï¼Į请
    0.17
    SHOW
    0.16
    Act Density 0.112%

    No Known Activations