INDEX
    Explanations

    instances of the term "label" within the text

    New Auto-Interp
    Negative Logits
    +#+
    -0.59
     baby
    -0.54
     myſelf
    -0.53
     passwords
    -0.53
    ghijklmnop
    -0.53
    iſten
    -0.52
     pleaſure
    -0.52
     cryst
    -0.52
     credit
    -0.52
     fluid
    -0.51
    POSITIVE LOGITS
    label
    1.08
     label
    0.95
     labels
    0.80
    Label
    0.80
    labels
    0.70
     Label
    0.70
    LABEL
    0.69
     Labels
    0.68
     etiqueta
    0.67
     LABEL
    0.63
    Act Density 0.236%

    No Known Activations