INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    I
    0.92
    3
    0.92
    4
    0.91
    6
    0.84
    ٣
    0.84
    IE
    0.82
    5
    0.79
    ים
    0.79
    0.78
    Q
    0.77
    POSITIVE LOGITS
     
    1.21
     to
    0.91
    y
    0.82
    zelfde
    0.80
     an
    0.72
     that
    0.69
    то
    0.69
     is
    0.68
     it
    0.67
    to
    0.67
    Act Density 0.879%

    No Known Activations