INDEX
    Explanations

    up in, as a, like a, not broken, not applying

    New Auto-Interp
    Negative Logits
     היו
    0.57
     صورة
    0.55
     እንቅስቃሴ
    0.54
    မျက်
    0.52
    0.52
     gobiernos
    0.52
    ες
    0.51
     Δια
    0.51
     periodistas
    0.51
    ρου
    0.50
    POSITIVE LOGITS
    an
    0.66
    a
    0.66
    in
    0.64
    e
    0.61
    f
    0.61
    h
    0.60
    ar
    0.59
    u
    0.59
    R
    0.57
    ed
    0.56
    Act Density 0.001%

    No Known Activations