INDEX
    Explanations

    instances of the word "try."

    New Auto-Interp
    Negative Logits
    benefit
    -0.60
     stink
    -0.57
     mole
    -0.56
    hop
    -0.55
    icipated
    -0.55
    hate
    -0.54
     Beir
    -0.54
     nom
    -0.53
    cele
    -0.53
     suits
    -0.53
    POSITIVE LOGITS
     again
    0.86
    ļéĨĴ
    0.78
    again
    0.72
     Again
    0.71
     Ctrl
    0.71
    ãĥĥãĥī
    0.71
    wcsstore
    0.69
     harder
    0.67
    Recommend
    0.66
    rex
    0.65
    Act Density 0.013%

    No Known Activations