INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    uros
    -0.07
    (rp
    -0.07
    acency
    -0.07
    РО
    -0.06
    ры
    -0.06
     چاپ
    -0.06
    -0.06
     kitap
    -0.06
    retty
    -0.06
     reput
    -0.06
    POSITIVE LOGITS
     foster
    0.15
     Foster
    0.15
     fostering
    0.10
     fost
    0.09
    0.08
    ost
    0.07
    ster
    0.07
     HOST
    0.07
     trailers
    0.07
    INTR
    0.07
    Act Density 0.002%

    No Known Activations