INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     mistakes
    -0.07
     Platt
    -0.07
     сохра
    -0.06
     interpolation
    -0.06
    -0.06
     حمایت
    -0.06
    _makeConstraints
    -0.06
     Oyun
    -0.06
     Benn
    -0.06
     Leisure
    -0.06
    POSITIVE LOGITS
    ank
    0.07
    052
    0.07
     induced
    0.07
    pokemon
    0.06
     abc
    0.06
     :\
    0.06
    .</
    0.06
    ayan
    0.06
    Includes
    0.06
    0.06
    Act Density 0.004%

    No Known Activations