INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    که
    0.38
    یم
    0.32
    г
    0.30
     önce
    0.30
    ますが
    0.29
    화를
    0.29
     óleo
    0.28
     emoción
    0.28
    دی
    0.27
    чення
    0.27
    POSITIVE LOGITS
    ar
    0.49
    k
    0.44
    an
    0.42
    z
    0.41
    ad
    0.39
    in
    0.38
    x
    0.38
    ro
    0.37
    er
    0.35
    ur
    0.35
    Act Density 0.725%

    No Known Activations