INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     همچنین
    -0.08
     accelerated
    -0.08
    شاه
    -0.08
     Stim
    -0.07
     toutefois
    -0.07
    avad
    -0.07
     odam
    -0.07
     biri
    -0.07
     cependant
    -0.07
     zamanda
    -0.07
    POSITIVE LOGITS
    -called
    0.10
    ’re
    0.08
     wast
    0.08
    forth
    0.08
    0.08
    రి
    0.07
    0.07
    сти
    0.07
    0.07
     wasting
    0.07
    Act Density 0.032%

    No Known Activations