INDEX
    Explanations

    words that indicate personal responsibility or acknowledgment of self

    New Auto-Interp
    Negative Logits
    agas
    -0.16
     MatSnackBar
    -0.14
    atives
    -0.14
    ibar
    -0.14
    agar
    -0.14
     Ment
    -0.14
    echn
    -0.14
     Manning
    -0.14
    亿åħĥ
    -0.13
     è£
    -0.13
    POSITIVE LOGITS
    abei
    0.16
    iddi
    0.15
    ãģıãģł
    0.15
    ypical
    0.14
    vell
    0.14
    eron
    0.14
    arna
    0.14
     Hund
    0.14
    kaar
    0.14
    دÙĩ
    0.14
    Act Density 0.032%

    No Known Activations