INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     improper
    -0.07
    -heavy
    -0.06
     borne
    -0.06
    Lazy
    -0.06
     burnt
    -0.06
     black
    -0.06
    .mc
    -0.06
    atype
    -0.06
    дал
    -0.06
    .car
    -0.06
    POSITIVE LOGITS
     Rifle
    0.07
     dildo
    0.07
    PropertyParams
    0.07
     privile
    0.07
     Shots
    0.06
    ocusing
    0.06
          		
    0.06
    hashtags
    0.06
     dokument
    0.06
    _SAMPL
    0.06
    Act Density 0.029%

    No Known Activations