INDEX
    Explanations

    mentions of specific individuals in a context related to sports or activities

    instances of proper nouns or specific entities

    New Auto-Interp
    Negative Logits
    bilt
    -0.84
    TAG
    -0.74
    ographed
    -0.67
    angular
    -0.66
    liest
    -0.66
     dreaded
    -0.63
    drawn
    -0.59
    NECT
    -0.59
     Sinn
    -0.59
    ROR
    -0.58
    POSITIVE LOGITS
    ktop
    0.81
    utsche
    0.76
    abase
    0.75
    onga
    0.73
    yu
    0.72
    hack
    0.68
    ilib
    0.67
    endas
    0.67
    aru
    0.66
    ushi
    0.65
    Act Density 0.230%

    No Known Activations