INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     comprend
    -0.07
    ████
    -0.07
     ماد
    -0.07
     народу
    -0.07
    -0.06
     ý
    -0.06
    _props
    -0.06
    (transform
    -0.06
     transporter
    -0.06
     millennia
    -0.06
    POSITIVE LOGITS
     thin
    0.07
     Regents
    0.07
     Kaepernick
    0.06
     Gaza
    0.06
    132
    0.06
     пал
    0.06
    aepernick
    0.06
    unate
    0.06
    acists
    0.06
    Alias
    0.06
    Act Density 0.001%

    No Known Activations