INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     demonstrators
    -0.06
     drowned
    -0.06
    lates
    -0.06
    -0.06
     hostage
    -0.06
    _projection
    -0.06
     теат
    -0.06
    :\/\/
    -0.06
     baked
    -0.06
    _sample
    -0.05
    POSITIVE LOGITS
     process
    0.08
    0.07
    .Video
    0.07
     okolí
    0.07
    FORCE
    0.06
    0.06
    TouchUpInside
    0.06
    илась
    0.06
     baptism
    0.06
    0.06
    Act Density 0.049%

    No Known Activations