INDEX
    Explanations

    the word "so" occurring with high activation values

    phrases that express a sense of negation or contradiction

    New Auto-Interp
    Negative Logits
    ãĥ¼ãĥĨãĤ£
    -0.85
    é¾
    -0.69
    MAP
    -0.68
    ãĤ¼ãĤ¦ãĤ¹
    -0.64
     annotations
    -0.63
    ayne
    -0.60
     {:
    -0.57
     SHARES
    -0.57
    DERR
    -0.57
     scan
    -0.56
    POSITIVE LOGITS
     much
    0.86
    aked
    0.84
    oths
    0.82
    oooo
    0.81
    ppy
    0.80
    othes
    0.77
    zin
    0.76
    akers
    0.74
    icably
    0.73
    oooooooo
    0.73
    Act Density 0.061%

    No Known Activations