INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.88
    ना
    0.84
    ми
    0.82
     in
    0.80
    ور
    0.73
    ла
    0.71
    elow
    0.68
    ができる
    0.67
    ين
    0.67
     في
    0.66
    POSITIVE LOGITS
     and
    0.82
    s
    0.67
     be
    0.66
     with
    0.65
     not
    0.64
     Y
    0.64
    from
    0.63
    tk
    0.60
     jurnal
    0.58
     S
    0.58
    Act Density 0.001%

    No Known Activations