INDEX
    Explanations

    proper names, particularly those related to individuals and their affiliations

    New Auto-Interp
    Negative Logits
    eter
    -0.93
    arian
    -0.88
    ition
    -0.83
    axter
    -0.82
    iaries
    -0.81
    entimes
    -0.78
    ary
    -0.78
    erry
    -0.77
    eln
    -0.76
    VB
    -0.76
    POSITIVE LOGITS
    xia
    0.75
    cipline
    0.74
    ãĤ¼ãĤ¦ãĤ¹
    0.74
    =-=-
    0.72
     ��������
    0.71
    @@@@
    0.70
    ãĥīãĥ©
    0.70
    ãĥĨ
    0.70
     sshd
    0.70
    66666666
    0.69
    Act Density 0.031%

    No Known Activations