INDEX
    Explanations

    commands or directives that draw attention

    New Auto-Interp
    Negative Logits
    uner
    -0.18
    eka
    -0.17
    ioned
    -0.16
    ãĥķãĥĪ
    -0.16
    626
    -0.16
    ê³
    -0.15
    ropolis
    -0.15
    ivirus
    -0.15
    cott
    -0.15
    егоÑĢ
    -0.15
    POSITIVE LOGITS
     closely
    0.25
     no
    0.22
     look
    0.19
     familiar
    0.18
     Fam
    0.18
     how
    0.18
     Look
    0.17
     ma
    0.17
     Sharp
    0.17
     LOOK
    0.17
    Act Density 0.014%

    No Known Activations