INDEX
    Explanations

    CWE followed by numbers

    New Auto-Interp
    Negative Logits
     This
    -1.96
    </h2>
    -1.95
    </h1>
    -1.92
     The
    -1.91
     It
    -1.62
     While
    -1.54
     or
    -1.44
     That
    -1.44
    This
    -1.42
     There
    -1.42
    POSITIVE LOGITS
     that
    1.56
     desn
    1.50
     obvio
    1.48
     afront
    1.48
     incrí
    1.38
     jist
    1.36
     récemment
    1.34
    1.34
     espect
    1.34
    implies
    1.33
    Act Density 0.005%

    No Known Activations