INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     rik
    -0.08
    /em
    -0.08
     fic
    -0.07
    (loc
    -0.07
     dept
    -0.07
     relocation
    -0.07
    وين
    -0.07
     emigr
    -0.07
    (for
    -0.07
    POSITIVE LOGITS
     biography
    0.09
     Biography
    0.09
     時計
    0.08
     Curious
    0.08
     primjer
    0.08
    maschine
    0.08
     tease
    0.08
    uchen
    0.08
     reciproc
    0.08
    fony
    0.08
    Act Density 0.001%

    No Known Activations