INDEX
    Explanations

    words indicating positive feelings or states of being

    New Auto-Interp
    Negative Logits
    onest
    -0.15
    urette
    -0.14
    las
    -0.14
     اÙĦعÙħ
    -0.14
     Wass
    -0.14
    anou
    -0.14
    ÏĦÎŃ
    -0.14
    'gc
    -0.14
    емо
    -0.14
    estroy
    -0.14
    POSITIVE LOGITS
    igu
    0.17
    æį®
    0.16
    inka
    0.15
    _BT
    0.15
     bearing
    0.15
     bear
    0.15
    EEK
    0.15
    -toast
    0.14
    icina
    0.14
    lij
    0.14
    Act Density 0.002%

    No Known Activations