INDEX
    Explanations

    phrases indicating persuasion or attempts to convince others

    New Auto-Interp
    Negative Logits
    alue
    -0.17
    arkan
    -0.15
    .nih
    -0.15
    çľģ
    -0.14
    太éĥİ
    -0.14
     ì¹ľ
    -0.14
    occasion
    -0.14
    .readValue
    -0.14
    roj
    -0.14
    úp
    -0.14
    POSITIVE LOGITS
     exc
    0.16
    432
    0.15
    å
    0.15
     drives
    0.15
    elsen
    0.14
    ought
    0.14
     poil
    0.14
    离
    0.14
    itzer
    0.14
     licensors
    0.14
    Act Density 0.019%

    No Known Activations