INDEX
    Explanations

    segments related to death or severe criminal actions

    New Auto-Interp
    Negative Logits
    \{\\
    -0.61
    <eos>
    -0.55
     or
    -0.48
     George
    -0.47
     internet
    -0.47
     Lu
    -0.47
    Fns
    -0.47
    ↵↵
    -0.46
    STAND
    -0.46
    ↵↵↵
    -0.45
    POSITIVE LOGITS
     pleaſure
    0.85
     purpoſe
    0.75
     ſind
    0.75
     myſelf
    0.74
     faſt
    0.74
     ſtate
    0.74
     iſt
    0.73
     Anſ
    0.73
     reaſon
    0.72
     cauſe
    0.71
    Act Density 0.230%

    No Known Activations