INDEX
    Explanations

    phrases that refer to monitoring or paying attention

    New Auto-Interp
    Negative Logits
    iro
    -0.16
    æĭľ
    -0.15
    inho
    -0.15
    -Ta
    -0.15
     trous
    -0.14
    iros
    -0.14
    775
    -0.14
    ittest
    -0.13
    760
    -0.13
    mediately
    -0.13
    POSITIVE LOGITS
     eye
    0.40
     tabs
    0.35
     track
    0.34
     close
    0.31
     Eye
    0.30
    eye
    0.29
    -eye
    0.27
    Eye
    0.27
     watch
    0.27
    close
    0.26
    Act Density 0.029%

    No Known Activations