INDEX
    Explanations

    phrases related to expressing disappointment or disapproval

    negative statements related to actions or behaviors

    New Auto-Interp
    Negative Logits
     Marina
    -0.62
     Juda
    -0.60
     gradually
    -0.60
     periodically
    -0.59
    udo
    -0.58
     Kafka
    -0.57
     Kraft
    -0.56
     Abbas
    -0.56
     Liberty
    -0.55
     Parad
    -0.55
    POSITIVE LOGITS
     anymore
    1.41
    âĢ
    1.28
    ̶
    1.17
    âľ
    1.13
    *.
    1.09
    \.
    1.04
    âĺ
    1.03
    âķ
    1.03
    ãĢ
    1.03
     nor
    1.02
    Act Density 0.506%

    No Known Activations