INDEX
    Explanations

    descriptors related to artistic, social, and cultural attributes

    New Auto-Interp
    Negative Logits
    onym
    -0.15
    473
    -0.14
    oro
    -0.14
    ord
    -0.14
     Roy
    -0.14
     Carlson
    -0.14
    rido
    -0.14
     T
    -0.14
    408
    -0.14
    474
    -0.13
    POSITIVE LOGITS
     nature
    0.33
     approach
    0.27
    nature
    0.27
    ness
    0.24
     aspects
    0.22
     aspect
    0.21
    appro
    0.21
     streak
    0.20
     بÙĪØ¯ÙĨ
    0.20
     Approach
    0.20
    Act Density 0.276%

    No Known Activations