INDEX
    Explanations

    sentences that express opinions or reflections on societal issues

    New Auto-Interp
    Negative Logits
    =$?
    -1.04
     oprot
    -0.93
     pleaſure
    -0.92
    ſelf
    -0.87
     myſelf
    -0.86
    theless
    -0.86
     ProtoMessage
    -0.86
    🏻‍♀️
    -0.83
    yntaxException
    -0.82
     Majefty
    -0.82
    POSITIVE LOGITS
    n
    0.64
     I
    0.63
    <
    0.57
     what
    0.57
    I
    0.55
     (
    0.55
    ↵↵
    0.55
     And
    0.54
    And
    0.54
    What
    0.54
    Act Density 0.287%

    No Known Activations