INDEX
    Explanations

    proper nouns referencing sports teams and their affiliations

    New Auto-Interp
    Negative Logits
    lessness
    -0.16
    нев
    -0.15
    apsed
    -0.15
    umbed
    -0.14
     Institutes
    -0.14
    ourcem
    -0.14
    ureau
    -0.14
    icho
    -0.14
    hausen
    -0.14
    ogui
    -0.14
    POSITIVE LOGITS
     faithful
    0.20
    ettes
    0.18
    们
    0.17
    urs
    0.15
    ies
    0.15
    birds
    0.15
     themselves
    0.15
    gs
    0.15
    Bias
    0.15
    les
    0.14
    Act Density 0.035%

    No Known Activations