INDEX
    Explanations

    phrases expressing desires or wishes to engage in actions

    New Auto-Interp
    Negative Logits
    ady
    -0.18
    aday
    -0.15
    á»įn
    -0.15
    ses
    -0.14
    ader
    -0.14
    wa
    -0.14
    lus
    -0.14
    .Fire
    -0.14
    alli
    -0.13
    PLE
    -0.13
    POSITIVE LOGITS
    oba
    0.15
    اÙĪÙĩ
    0.15
     Sabb
    0.14
    .crm
    0.14
    nodoc
    0.14
    cox
    0.13
     ë¶
    0.13
    azar
    0.13
     atol
    0.13
    antom
    0.13
    Act Density 0.020%

    No Known Activations