INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    boxes
    -0.08
     warriors
    -0.07
     בכלל
    -0.07
    _assigned
    -0.07
    alfa
    -0.07
     även
    -0.07
     yetiştir
    -0.07
     unnatural
    -0.07
    ppers
    -0.07
    eff
    -0.07
    POSITIVE LOGITS
     предназ
    0.08
    (de
    0.07
    Reducer
    0.07
    (border
    0.07
    _IDLE
    0.07
    iom
    0.07
    קש
    0.07
    0.07
    Constructed
    0.07
    0.07
    Act Density 0.012%

    No Known Activations