INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     be
    1.01
     as
    0.81
     are
    0.63
     llev
    0.61
    lL
    0.60
    行う
    0.60
    0.60
    {
    0.59
     לג
    0.59
     feria
    0.59
    POSITIVE LOGITS
    in
    0.92
    0
    0.92
    ه
    0.76
    те
    0.72
    ust
    0.68
    an
    0.67
    ти
    0.67
     K
    0.66
    0.66
    0.65
    Act Density 0.028%

    No Known Activations