INDEX
    Explanations

    positive attributes and states

    New Auto-Interp
    Negative Logits
    R
    0.51
    lf
    0.48
     הז
    0.43
    CT
    0.41
    N
    0.39
    K
    0.39
    Z
    0.39
    S
    0.39
    Y
    0.38
    J
    0.38
    POSITIVE LOGITS
     on
    0.50
    in
    0.49
    对待
    0.45
     while
    0.44
    ѝ
    0.42
     enough
    0.41
    eness
    0.41
     selama
    0.40
     kwenye
    0.40
     على
    0.40
    Act Density 0.144%

    No Known Activations