INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     named
    0.42
     firstly
    0.42
     importantly
    0.40
     primarily
    0.40
     including
    0.40
    |
    0.39
     \|
    0.38
     mainly
    0.37
    كلمنا
    0.37
     noise
    0.37
    POSITIVE LOGITS
     (“
    1.55
     ("
    1.40
    :“
    1.40
     “…
    1.39
    1.33
     („
    1.33
     "...
    1.28
    :“
    1.28
    —“
    1.27
    :"
    1.22
    Act Density 0.107%

    No Known Activations