INDEX
    Explanations

    proper nouns or names

    instances of specific letters or combinations of letters

    New Auto-Interp
    Negative Logits
     flares
    -0.74
     flare
    -0.68
     pse
    -0.64
     Rosenberg
    -0.61
     Stub
    -0.61
     Osw
    -0.61
    arlane
    -0.59
    DEP
    -0.59
     challeng
    -0.59
     contrace
    -0.58
    POSITIVE LOGITS
    ï¸ı
    0.91
    oise
    0.78
    edu
    0.75
    é
    0.73
    hai
    0.71
    merce
    0.71
    oir
    0.70
     Rouge
    0.68
    schild
    0.67
    ailable
    0.67
    Act Density 0.129%

    No Known Activations