INDEX
    Explanations

    words associated with confidence and physical attributes

    New Auto-Interp
    Negative Logits
    utex
    -0.17
    ova
    -0.17
    emean
    -0.16
    xing
    -0.16
    ude
    -0.16
    833
    -0.15
    ÃĹ↵↵
    -0.15
    ey
    -0.15
    liness
    -0.14
    ged
    -0.14
    POSITIVE LOGITS
    soever
    0.16
    ingham
    0.16
    -mounted
    0.15
    cards
    0.15
    .bundle
    0.15
    /column
    0.14
    ables
    0.14
    marks
    0.14
    aci
    0.13
    anghai
    0.13
    Act Density 0.019%

    No Known Activations