INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     abhor
    0.43
     ridicu
    0.41
     sexist
    0.40
     colis
    0.40
     ridiculous
    0.38
     playthrough
    0.36
    TextBoxColumn
    0.36
     leid
    0.36
     uhr
    0.36
    ElementChild
    0.35
    POSITIVE LOGITS
    <h2>
    0.38
    In
    0.37
    Using
    0.36
    ![
    0.36
    Why
    0.36
    Positive
    0.35
    Sure
    0.35
    Our
    0.35
    <h4>
    0.35
    ##
    0.34
    Act Density 0.004%

    No Known Activations