INDEX
    Explanations

    words related to personal attributes and actions, particularly in contexts of relationships and roles

    New Auto-Interp
    Negative Logits
     shields
    -0.15
    inish
    -0.15
     affected
    -0.14
    aye
    -0.14
     Shield
    -0.14
    affected
    -0.14
    мÑĥ
    -0.14
     ฿
    -0.14
    \CMS
    -0.13
    fal
    -0.13
    POSITIVE LOGITS
    ané
    0.17
    aroo
    0.16
    ahoo
    0.16
     éķ·
    0.15
    apol
    0.15
    Beat
    0.15
    arna
    0.15
    terdam
    0.15
    IMIT
    0.15
    beat
    0.14
    Act Density 0.017%

    No Known Activations