INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    >`
    -0.08
     repos
    -0.07
    >.</
    -0.07
    -old
    -0.07
     ادا
    -0.07
     bann
    -0.07
     pillars
    -0.07
    jen
    -0.07
     ancillary
    -0.07
     Jenkins
    -0.07
    POSITIVE LOGITS
     lining
    0.10
     lined
    0.09
     belakang
    0.08
    lining
    0.08
    -lined
    0.08
    0.08
     Everyday
    0.08
     Vara
    0.08
    slot
    0.07
    tructor
    0.07
    Act Density 0.007%

    No Known Activations