INDEX
    Explanations

    pronouns and verb phrases that indicate actions taken by people

    New Auto-Interp
    Negative Logits
    ãĥ©ãĤ¹
    -0.07
    æIJ¬
    -0.07
    erner
    -0.06
    anj
    -0.06
    ç´
    -0.06
    lech
    -0.06
    ESC
    -0.06
    aus
    -0.06
    astle
    -0.06
    ÙĪØ§Ùĩ
    -0.06
    POSITIVE LOGITS
     doing
    0.11
     done
    0.10
     Doing
    0.09
    Doing
    0.09
     best
    0.09
    doing
    0.08
    best
    0.08
     always
    0.07
     Done
    0.07
     did
    0.07
    Act Density 0.011%

    No Known Activations