INDEX
    Explanations

    words associated with health, wellness, and social interactions

    New Auto-Interp
    Negative Logits
    105
    -0.17
     ов
    -0.15
    andra
    -0.14
    lu
    -0.14
    iegel
    -0.14
    Stopping
    -0.14
    halt
    -0.14
    ozo
    -0.14
    107
    -0.14
    inkel
    -0.14
    POSITIVE LOGITS
     actionTypes
    0.15
    uÄį
    0.15
    iali
    0.15
    _PACK
    0.15
    \base
    0.15
    ToAdd
    0.14
    bero
    0.14
    urum
    0.14
    út
    0.14
    ------+------+
    0.14
    Act Density 0.037%

    No Known Activations