INDEX
    Explanations

    items that have been specifically marked or identified with a label

    terms related to labeling or categorizing objects or ideas

    New Auto-Interp
    Negative Logits
    ramid
    -0.75
    =-=-
    -0.73
    ppa
    -0.72
    yre
    -0.72
    perty
    -0.70
    vati
    -0.69
    abama
    -0.68
    hire
    -0.67
     compr
    -0.67
    vous
    -0.67
    POSITIVE LOGITS
    phas
    0.85
    ging
    0.69
     unfit
    0.68
    ged
    0.67
    own
    0.67
     labelled
    0.67
     labeled
    0.66
     labeling
    0.64
     loyalty
    0.64
     branded
    0.63
    Act Density 0.034%

    No Known Activations