INDEX
    Explanations

    proper nouns, particularly names of organizations and media sources

    New Auto-Interp
    Negative Logits
    s
    -0.15
     lou
    -0.14
     end
    -0.14
    oub
    -0.14
    aper
    -0.14
     rage
    -0.14
    .__
    -0.14
    онÑĮ
    -0.14
    anking
    -0.14
     bol
    -0.13
    POSITIVE LOGITS
     enumerator
    0.16
    ails
    0.15
    anza
    0.15
    _alias
    0.15
    ÎŃαÏĤ
    0.14
    -wsj
    0.14
    plit
    0.14
    éļª
    0.14
    izard
    0.14
    HIR
    0.14
    Act Density 0.019%

    No Known Activations