INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    0.81
    ب
    0.79
    D
    0.76
    in
    0.71
    YD
    0.71
    st
    0.69
    set
    0.69
    m
    0.68
    the
    0.68
    win
    0.67
    POSITIVE LOGITS
    1.01
    0.88
    0.77
    і
    0.77
    é
    0.74
    0.72
    0.72
    0.71
    0.70
    0.69
    Act Density 0.001%

    No Known Activations