INDEX
    Explanations

    negative feelings

    New Auto-Interp
    Negative Logits
     kể
    -0.07
    .Alter
    -0.06
     corrupt
    -0.06
     Km
    -0.06
     legend
    -0.06
     EMAIL
    -0.06
     born
    -0.06
    لم
    -0.06
    (ai
    -0.06
    .ed
    -0.06
    POSITIVE LOGITS
    _Path
    0.07
    .Anchor
    0.06
    _Man
    0.06
     már
    0.06
     вказ
    0.06
     яким
    0.06
    0.06
    ैसल
    0.06
    0.06
    (shift
    0.06
    Act Density 0.135%

    No Known Activations