INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     пап
    -0.08
     Videos
    -0.07
    ncia
    -0.07
     Nguyen
    -0.07
     truthful
    -0.06
     antic
    -0.06
     Stars
    -0.06
    iki
    -0.06
    -0.06
    Fake
    -0.06
    POSITIVE LOGITS
    xmin
    0.07
     accr
    0.06
     ValueType
    0.06
     typeid
    0.06
     crushed
    0.06
    insi
    0.05
    UNUSED
    0.05
     THROW
    0.05
    øre
    0.05
    .ul
    0.05
    Act Density 0.008%

    No Known Activations