INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     شهرهای
    -0.07
    inals
    -0.07
    DEN
    -0.06
     urb
    -0.06
     told
    -0.06
    istrat
    -0.06
     hills
    -0.06
    City
    -0.06
     ville
    -0.06
    -0.06
    POSITIVE LOGITS
     당신
    0.07
     adopts
    0.07
    edata
    0.07
     effortlessly
    0.07
     discretion
    0.06
    0.06
     dx
    0.06
     implement
    0.06
    alsa
    0.06
     با
    0.06
    Act Density 0.004%

    No Known Activations