INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     thinner
    -0.07
    chia
    -0.07
     intellectual
    -0.07
     cuisine
    -0.06
    うち
    -0.06
     Inputs
    -0.06
     dose
    -0.06
     Pakistan
    -0.06
     integrated
    -0.06
     academic
    -0.06
    POSITIVE LOGITS
    args
    0.07
    0.07
    icont
    0.07
    brig
    0.06
     Vere
    0.06
    änd
    0.06
    0.06
    едаг
    0.06
    ulance
    0.06
    yz
    0.06
    Act Density 0.030%

    No Known Activations