INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rising
    -0.08
    Emer
    -0.08
     الحكومية
    -0.08
     العلمية
    -0.07
    Welcome
    -0.07
    Comic
    -0.07
    Provide
    -0.07
     Emerging
    -0.07
     dini
    -0.07
     emergence
    -0.07
    POSITIVE LOGITS
    0.08
    β
    0.08
     β
    0.08
    _beta
    0.07
    0.07
     annoyance
    0.07
     четырех
    0.07
    _frag
    0.07
     Savior
    0.07
     koff
    0.07
    Act Density 0.001%

    No Known Activations