INDEX
    Explanations

    phrases indicating personal reflection or introspection

    New Auto-Interp
    Negative Logits
    oden
    -0.16
    agra
    -0.15
    _gs
    -0.15
    dbuf
    -0.15
    aphore
    -0.15
    ائر
    -0.14
    alık
    -0.14
    .fd
    -0.14
    isty
    -0.14
    گاÙĩ
    -0.14
    POSITIVE LOGITS
     Fem
    0.17
     Lad
    0.17
    lick
    0.15
    l
    0.15
    rough
    0.15
    sel
    0.14
    uring
    0.14
    zes
    0.14
     ZIP
    0.14
    ilm
    0.14
    Act Density 0.000%

    No Known Activations