INDEX
    Explanations

    sentences that discuss the complexity or structure of sentences, particularly regarding their length and grammatical elements

    New Auto-Interp
    Negative Logits
     pleaſure
    -0.94
     myſelf
    -0.91
     uſed
    -0.89
    setVerticalGroup
    -0.88
     himſelf
    -0.87
    ſelf
    -0.86
     themſelves
    -0.86
     purpoſe
    -0.85
     houſe
    -0.83
     Jefus
    -0.83
    POSITIVE LOGITS
    0.61
     nice
    0.60
    ...
    0.59
     ça
    0.58
     b
    0.57
     :)
    0.57
    ....
    0.56
     w
    0.55
     shit
    0.55
     cool
    0.54
    Act Density 0.027%

    No Known Activations