INDEX
    Explanations

    statements emphasizing positivity and gratitude

    New Auto-Interp
    Negative Logits
    .generated
    -0.15
    bomb
    -0.15
    bol
    -0.14
     bol
    -0.14
    584
    -0.13
    itou
    -0.13
     linker
    -0.13
     undeniable
    -0.13
    upe
    -0.13
     dominated
    -0.13
    POSITIVE LOGITS
     why
    0.26
     what
    0.23
     something
    0.22
    why
    0.22
     exactly
    0.20
    something
    0.19
    what
    0.19
    Characteristic
    0.18
     characteristic
    0.18
     precisely
    0.17
    Act Density 0.121%

    No Known Activations