INDEX
    Explanations

    harmful thoughts not defining you

    New Auto-Interp
    Negative Logits
     pode
    0.49
     utilizado
    0.47
     comentário
    0.46
     فَ
    0.46
     utilizados
    0.45
     Ё
    0.44
    encije
    0.44
     comentários
    0.44
     буты
    0.44
     suro
    0.44
    POSITIVE LOGITS
    a
    0.52
    in
    0.49
    i
    0.48
    w
    0.44
    o
    0.43
    le
    0.41
    1
    0.41
    9
    0.41
    pr
    0.41
    个人
    0.40
    Act Density 0.002%

    No Known Activations