INDEX
    Explanations

    pronouns and words associated with requests or permissions

    New Auto-Interp
    Negative Logits
    ise
    -0.17
     apart
    -0.16
     Sims
    -0.15
     win
    -0.15
     env
    -0.15
     ph
    -0.14
     Kare
    -0.14
     away
    -0.14
     ship
    -0.14
     Trace
    -0.14
    POSITIVE LOGITS
    ApplicationBuilder
    0.16
    ãĥ³ãĤ¸
    0.15
    ior
    0.15
    bjerg
    0.15
    svp
    0.15
    avel
    0.15
    668
    0.15
    Ïĥι
    0.14
    elper
    0.14
    jax
    0.14
    Act Density 0.000%

    No Known Activations