INDEX
    Explanations

    phrases that prompt critical thinking or reflection

    New Auto-Interp
    Negative Logits
     Hacker
    -0.15
    .named
    -0.14
    extr
    -0.14
    ̧
    -0.14
    ighton
    -0.14
    idal
    -0.14
    yen
    -0.14
    ellaneous
    -0.13
    aphrag
    -0.13
    ches
    -0.13
    POSITIVE LOGITS
    åIJ§
    0.17
     McL
    0.14
    AMESPACE
    0.14
    (++
    0.14
     yourself
    0.14
    .poly
    0.13
     though
    0.13
    tout
    0.13
     Escort
    0.13
    asan
    0.13
    Act Density 0.064%

    No Known Activations