INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     قول
    -0.07
    -0.07
    autor
    -0.07
    <translation
    -0.07
    ero
    -0.07
    آ
    -0.07
    gesture
    -0.06
     rendre
    -0.06
    。一
    -0.06
     하루
    -0.06
    POSITIVE LOGITS
     linked
    0.10
     Linked
    0.09
     linking
    0.09
    -linked
    0.08
     linkage
    0.07
    linked
    0.07
     listed
    0.06
     Net
    0.06
     refers
    0.06
    Linked
    0.06
    Act Density 0.009%

    No Known Activations