INDEX
    Explanations

    words and phrases that express negative feelings, disapproval, or unethical behavior

    negative sentiment

    New Auto-Interp
    Negative Logits
    -0.49
    dr
    -0.47
     p
    -0.47
    pex
    -0.46
     La
    -0.46
    dev
    -0.44
     Ber
    -0.44
    ↵↵
    -0.44
     N
    -0.44
    Int
    -0.44
    POSITIVE LOGITS
     itſelf
    0.90
     Majefty
    0.88
     pleaſure
    0.83
     Monfieur
    0.81
     Jefus
    0.80
     ―――――
    0.78
     fubject
    0.78
     ſever
    0.78
    RenderAtEndOf
    0.78
     doubtnut
    0.77
    Act Density 2.317%

    No Known Activations