INDEX
    Explanations

    negative connotations or criticisms related to experience and quality

    New Auto-Interp
    Negative Logits
     myſelf
    -1.55
     Theſe
    -1.52
     Efq
    -1.51
     ―――――
    -1.41
     pleaſure
    -1.41
     Monfieur
    -1.39
     faſt
    -1.39
     itſelf
    -1.38
     whoſe
    -1.37
     becauſe
    -1.35
    POSITIVE LOGITS
    <eos>
    1.32
    ↵↵
    1.10
    0.95
    0.84
     The
    0.80
     (
    0.78
     In
    0.73
     a
    0.72
     the
    0.70
     I
    0.70
    Act Density 0.671%

    No Known Activations