INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     необ
    -0.07
    Thickness
    -0.07
    .non
    -0.07
    اهای
    -0.07
     polys
    -0.07
    ovaných
    -0.07
     вра
    -0.06
    کاری
    -0.06
     ویر
    -0.06
    -0.06
    POSITIVE LOGITS
     had
    0.07
    	entry
    0.07
    “↵↵
    0.07
     ue
    0.06
    ै।↵↵
    0.06
    0.06
    0.06
     mond
    0.06
    0.06
     fot
    0.06
    Act Density 0.008%

    No Known Activations