INDEX
    Explanations

    references to the pronoun "you."

    New Auto-Interp
    Negative Logits
    eer
    -0.19
    agas
    -0.16
    ↵↵
    -0.16
    aler
    -0.15
     Awareness
    -0.15
    gnore
    -0.15
    باÙĨ
    -0.14
    оиÑĤ
    -0.14
    ango
    -0.14
     Knowledge
    -0.14
    POSITIVE LOGITS
     know
    0.37
    know
    0.29
     Know
    0.25
    Know
    0.24
     knows
    0.20
    KN
    0.18
    çŁ¥éģĵ
    0.17
     KNOW
    0.16
    çŁ¥
    0.16
     mentioned
    0.15
    Act Density 0.028%

    No Known Activations