INDEX
    Explanations

    text related to physical instructions or steps

    New Auto-Interp
    Negative Logits
    !.
    -0.77
    %.
    -0.69
    $.
    -0.66
    ,...
    -0.65
    +.
    -0.64
    *.
    -0.63
     although
    -0.63
    '.
    -0.63
    ';
    -0.61
    HY
    -0.61
    POSITIVE LOGITS
    pires
    0.89
     depends
    0.73
     constitutes
    0.72
     entails
    0.69
    pired
    0.69
     involves
    0.68
     isn
    0.66
     mattered
    0.63
     varies
    0.62
     implies
    0.62
    Act Density 3.163%

    No Known Activations