INDEX
    Explanations

    phrases that denote uniqueness or exceptional qualities

    New Auto-Interp
    Negative Logits
    rome
    -0.17
    tera
    -0.16
    inel
    -0.15
    anela
    -0.15
    inke
    -0.14
    acer
    -0.14
    dg
    -0.14
    iskey
    -0.14
    ť
    -0.14
     Shame
    -0.14
    POSITIVE LOGITS
     ordinary
    0.44
    ordinary
    0.36
    æĻ®éĢļ
    0.35
     usual
    0.33
     typical
    0.33
     обÑĭÑĩ
    0.31
    usual
    0.31
     Ordinary
    0.31
     normal
    0.29
     æĻ®éĢļ
    0.28
    Act Density 0.075%

    No Known Activations