INDEX
    Explanations

    the presence of the word "you" and its associated forms in various contexts

    New Auto-Interp
    Negative Logits
    pieces
    -0.17
    νε
    -0.17
    ustos
    -0.15
    enet
    -0.15
    ensa
    -0.15
     activations
    -0.14
    orsi
    -0.14
    ç¼ĺ
    -0.14
     سخ
    -0.14
    oud
    -0.14
    POSITIVE LOGITS
    ìĬĪ
    0.15
    á»ĩ
    0.14
     Drain
    0.14
    èĻ«
    0.14
    092
    0.14
    RAIN
    0.14
    anela
    0.14
    942
    0.14
    941
    0.14
     jeu
    0.14
    Act Density 0.002%

    No Known Activations