INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    this
    0.76
     this
    0.73
     halides
    0.73
     dieser
    0.67
    -
    0.64
     chodzi
    0.63
    !
    0.60
    accar
    0.59
     cocon
    0.57
    ?
    0.57
    POSITIVE LOGITS
    س
    0.74
     ل
    0.70
    0.60
    ные
    0.59
    ченные
    0.59
    л
    0.58
    с
    0.57
     یه
    0.57
    0.57
    }$&$
    0.57
    Act Density 0.600%

    No Known Activations