INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ،
    0.58
    0.51
     ،
    0.42
    0.42
    0.39
    .
    0.38
    、(
    0.36
    .$,
    0.36
    0.34
    0.34
    POSITIVE LOGITS
    if
    0.33
    is
    0.33
     We
    0.32
    ain
    0.31
    0.30
     it
    0.30
     we
    0.30
    если
    0.29
     I
    0.29
    we
    0.29
    Act Density 0.457%

    No Known Activations