INDEX
    Explanations

    instances of guidance or empowerment related to self-improvement, resources, or education

    New Auto-Interp
    Negative Logits
    avec
    -0.14
    ká
    -0.14
     Sunder
    -0.14
    ysz
    -0.14
    shit
    -0.14
    .df
    -0.13
    ierce
    -0.13
    rosso
    -0.13
     overall
    -0.13
    borrow
    -0.13
    POSITIVE LOGITS
     Wis
    0.27
     jud
    0.26
     advantage
    0.25
     differently
    0.23
     wisely
    0.23
    wis
    0.22
     wisdom
    0.20
     towards
    0.20
     smart
    0.20
     Advantage
    0.20
    Act Density 0.092%

    No Known Activations