INDEX
    Explanations

    proper names, particularly those of political figures and entities

    New Auto-Interp
    Negative Logits
    aylor
    -0.08
    ernes
    -0.08
    andes
    -0.07
    ulet
    -0.07
    ushima
    -0.07
    iglia
    -0.07
    ople
    -0.07
    iola
    -0.07
    apus
    -0.07
    κοÏį
    -0.07
    POSITIVE LOGITS
    LOY
    0.06
     Wire
    0.06
    ationToken
    0.05
    ê·¼
    0.05
    431
    0.05
    512
    0.05
    εδ
    0.05
    938
    0.05
     Caribbean
    0.05
    pillar
    0.05
    Act Density 0.001%

    No Known Activations