INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    وقت
    -0.07
    igrants
    -0.07
    이를
    -0.07
    İS
    -0.07
    ienia
    -0.06
     dne
    -0.06
     Noel
    -0.06
     먼저
    -0.06
    ingga
    -0.06
     immigrants
    -0.06
    POSITIVE LOGITS
     Mach
    0.17
     mach
    0.16
    mach
    0.11
     Bach
    0.09
    ach
    0.08
     trash
    0.08
    ACH
    0.08
     machining
    0.08
     hete
    0.08
    たち
    0.08
    Act Density 0.003%

    No Known Activations