INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ত্র
    -0.08
    marca
    -0.08
    atcher
    -0.08
    lah
    -0.07
    andenburg
    -0.07
    obe
    -0.07
     herr
    -0.07
    enha
    -0.07
    Hai
    -0.07
    wy
    -0.07
    POSITIVE LOGITS
    0.08
     Eg
    0.08
     pinpoint
    0.07
     captured
    0.07
     ل
    0.07
     pret
    0.07
    0.07
    (comm
    0.07
     bully
    0.07
     skew
    0.07
    Act Density 0.002%

    No Known Activations