INDEX
    Explanations

    phrases indicating urgency or immediacy

    New Auto-Interp
    Negative Logits
    azio
    -0.15
    wang
    -0.14
    ataka
    -0.14
     Pointer
    -0.14
     Bars
    -0.13
    243
    -0.13
    erais
    -0.13
     Brands
    -0.13
     sare
    -0.13
    ith
    -0.13
    POSITIVE LOGITS
    åĪĹ
    0.17
    ller
    0.16
    ousel
    0.15
    URA
    0.15
    że
    0.15
    çµ¶
    0.14
    liest
    0.14
    ATEGORY
    0.14
    iples
    0.14
    Forbidden
    0.14
    Act Density 0.005%

    No Known Activations