INDEX
    Explanations

    references to people's feelings and perceptions

    New Auto-Interp
    Negative Logits
    eyle
    -0.17
    entifier
    -0.16
    eson
    -0.16
    lassen
    -0.15
    earn
    -0.15
    amage
    -0.15
    sla
    -0.15
    RowAt
    -0.15
    sian
    -0.15
    алеж
    -0.14
    POSITIVE LOGITS
    år
    0.15
    -ra
    0.14
     who
    0.13
    oha
    0.13
     preferred
    0.13
    AILS
    0.13
     Heg
    0.13
    MinMax
    0.13
    raised
    0.13
    èĨ
    0.13
    Act Density 0.088%

    No Known Activations