INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     preserve
    -0.08
    ERV
    -0.08
    bild
    -0.08
    Instagram
    -0.08
    bilder
    -0.08
    Netflix
    -0.08
    950
    -0.08
     WIN
    -0.08
     porno
    -0.07
     Moodle
    -0.07
    POSITIVE LOGITS
     круг
    0.09
     appreci
    0.08
     analyt
    0.08
     físicos
    0.08
     aggravated
    0.08
     tetra
    0.08
     exposition
    0.07
    -solving
    0.07
     perspectivas
    0.07
     mathematic
    0.07
    Act Density 0.008%

    No Known Activations