INDEX
    Explanations

    phrases indicating a state of perception or awareness

    New Auto-Interp
    Negative Logits
    ëį°ìĿ´íĬ¸
    -0.15
     vice
    -0.15
     Vice
    -0.15
    .Display
    -0.14
    811
    -0.14
     Falcon
    -0.14
    Äĥr
    -0.14
     ØŃÙĪ
    -0.14
    ernet
    -0.14
    jes
    -0.14
    POSITIVE LOGITS
    oulos
    0.16
    habi
    0.15
    -disable
    0.15
    uzey
    0.14
     gap
    0.14
    [of
    0.13
    arda
    0.13
    .dsl
    0.13
    __':č↵
    0.13
    adora
    0.13
    Act Density 0.042%

    No Known Activations