INDEX
    Explanations

    single uppercase letters or acronyms in a particular context

    capital letters or proper nouns

    New Auto-Interp
    Negative Logits
    )=(
    -0.79
    hire
    -0.78
    bent
    -0.73
    xon
    -0.73
    negie
    -0.72
    REDACTED
    -0.67
    come
    -0.64
    crim
    -0.64
    orio
    -0.64
    bring
    -0.63
    POSITIVE LOGITS
    umps
    0.94
    ucks
    0.85
    ippers
    0.82
    oses
    0.79
    ixture
    0.78
    enses
    0.77
    umper
    0.76
    leep
    0.75
    agging
    0.75
    oots
    0.74
    Act Density 0.149%

    No Known Activations