INDEX
    Explanations

    statements regarding societal issues and interactions among people

    New Auto-Interp
    Negative Logits
    ิà¹ī
    -0.16
     Sense
    -0.16
    zyst
    -0.16
    NB
    -0.15
    INED
    -0.15
    idel
    -0.15
    ucha
    -0.15
     Gazette
    -0.15
    refix
    -0.15
    pNext
    -0.14
    POSITIVE LOGITS
    νομ
    0.16
    ÑĢей
    0.15
     Graves
    0.15
    olis
    0.15
    alta
    0.15
     hers
    0.14
     ours
    0.14
    à¤ĸ
    0.14
    ãĥĩãĤ£ãĤ¢
    0.14
    yw
    0.13
    Act Density 0.289%

    No Known Activations