INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ib
    -1.57
    IB
    -1.22
    ibs
    -1.05
    ibe
    -0.93
     ib
    -0.92
    ibir
    -0.86
    ibal
    -0.79
     IB
    -0.79
    iba
    -0.78
     سكانية
    -0.77
    POSITIVE LOGITS
    y
    0.67
    s
    0.60
    es
    0.56
    as
    0.42
    capp
    0.41
    ita
    0.41
    ys
    0.40
    inai
    0.40
    0.40
    ет
    0.40
    Act Density 0.011%

    No Known Activations