INDEX
    Explanations

    references to physical and mental well-being

    New Auto-Interp
    Negative Logits
    _physical
    -0.17
    rog
    -0.16
    çī©çIJĨ
    -0.16
    Physics
    -0.16
    erez
    -0.16
     Physics
    -0.15
    кав
    -0.15
    aca
    -0.15
    /goto
    -0.15
    okit
    -0.15
    POSITIVE LOGITS
    ity
    0.41
    ITY
    0.29
    ities
    0.26
    s
    0.23
    /log
    0.23
     therapist
    0.23
    ized
    0.22
     therapists
    0.22
    /em
    0.21
    mente
    0.21
    Act Density 0.023%

    No Known Activations