INDEX
    Explanations

    instances of the word "think" and its variations used to prompt reflection or consideration

    New Auto-Interp
    Negative Logits
    ãĤ¸ãĤ¢
    -0.16
    ious
    -0.16
    leton
    -0.14
    vido
    -0.14
    ldr
    -0.14
    hle
    -0.14
    ÏĨÏħ
    -0.13
    iano
    -0.13
    аÑĢÑĩ
    -0.13
    رÙĪÙħ
    -0.13
    POSITIVE LOGITS
    tors
    0.16
    ock
    0.15
    æĭ¥
    0.15
    amina
    0.15
    erve
    0.14
    orthand
    0.14
    ascar
    0.14
    LOAT
    0.14
    lü
    0.13
    ollo
    0.13
    Act Density 0.024%

    No Known Activations