INDEX
    Explanations

    phrases related to labeling and classification

    references to labels and labeling practices

    New Auto-Interp
    Negative Logits
    tein
    -0.70
    issance
    -0.67
     Globe
    -0.66
     Yin
    -0.64
    vati
    -0.64
    ashington
    -0.64
    ctica
    -0.63
    gm
    -0.63
    olars
    -0.62
    hire
    -0.62
    POSITIVE LOGITS
     label
    0.90
    mates
    0.89
     labels
    0.87
    mate
    0.85
    cloth
    0.80
    strip
    0.76
    label
    0.76
    Label
    0.75
    red
    0.75
    mark
    0.74
    Act Density 0.022%

    No Known Activations