INDEX
    Explanations

    phrases indicating direction or intention

    New Auto-Interp
    Negative Logits
    /w
    -0.15
    Ïīδ
    -0.15
    imer
    -0.15
    ares
    -0.14
    -insert
    -0.14
    hop
    -0.14
    ÅĻÃŃklad
    -0.14
    ovali
    -0.14
    uforia
    -0.13
    ek
    -0.13
    POSITIVE LOGITS
    GGLE
    0.17
    /from
    0.17
    ies
    0.16
     Tow
    0.16
    ement
    0.16
     toward
    0.16
    sWith
    0.15
    /about
    0.15
     towards
    0.14
    roots
    0.14
    Act Density 0.026%

    No Known Activations