INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    IsContent
    -0.99
     Мексичка
    -0.99
     pleaſure
    -0.97
     myſelf
    -0.97
     itſelf
    -0.93
    ########.
    -0.92
    SequentialGroup
    -0.90
    +#+
    -0.90
    dafx
    -0.88
     Efq
    -0.88
    POSITIVE LOGITS
    ↵↵
    0.76
    0.75
    0.75
    '
    0.74
    0.66
    <eos>
    0.64
     (
    0.60
    a
    0.59
    -
    0.59
    0.56
    Act Density 0.023%

    No Known Activations