INDEX
    Explanations

    punctuation or structural elements indicating the end of a thought or sentence

    New Auto-Interp
    Negative Logits
     itſelf
    -0.91
     ſta
    -0.83
     myſelf
    -0.83
     ſeveral
    -0.81
     ſtill
    -0.80
     themſelves
    -0.79
     uſed
    -0.79
     ſte
    -0.78
     Monfieur
    -0.77
     ſtand
    -0.76
    POSITIVE LOGITS
    <bos>
    1.16
    //
    0.80
    ]},
    0.75
    ))$.
    0.69
    istoitu
    0.68
    ')}}">
    0.67
    )<<
    0.66
    '}>
    0.65
    "]').
    0.64
    ']").
    0.63
    Act Density 0.258%

    No Known Activations