INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     または
    0.55
    ية
    0.53
     ו
    0.52
     există
    0.51
    または
    0.51
    0.51
    0.50
    0.50
     iako
    0.50
     është
    0.50
    POSITIVE LOGITS
    a
    1.03
    o
    0.70
    the
    0.59
    ه
    0.57
    The
    0.52
    ش
    0.51
    tr
    0.48
    u
    0.48
    0.48
    g
    0.46
    Act Density 0.097%

    No Known Activations