INDEX
    Explanations

    positive and negative sentiments towards various topics or entities

    phrases reflecting attitudes and evaluations, particularly about kindness and negativity

    New Auto-Interp
    Negative Logits
    UNCH
    -0.78
    ĸļ
    -0.73
     impossibility
    -0.73
    utter
    -0.70
    igsaw
    -0.68
    rame
    -0.66
    ulo
    -0.62
    alsa
    -0.60
    igs
    -0.60
    rossover
    -0.60
    POSITIVE LOGITS
     toward
    1.16
     towards
    1.12
     Towards
    0.86
     disposed
    0.85
     relations
    0.85
    itism
    0.82
     attitude
    0.77
     gays
    0.73
    Semitic
    0.71
     demeanor
    0.70
    Act Density 0.440%

    No Known Activations