INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.59
    ↵↵
    -0.58
    I
    -0.55
    .
    -0.54
    FORMANCE
    -0.53
    makeText
    -0.53
    }
    -0.53
    :
    -0.53
    ↵↵↵
    -0.52
    ;
    -0.52
    POSITIVE LOGITS
     simplif
    1.34
     hentai
    1.29
     milf
    1.24
     emphat
    1.23
     Souha
    1.22
     michelin
    1.21
     Mlle
    1.21
     ritard
    1.19
     intermitt
    1.16
     incess
    1.15
    Act Density 0.489%

    No Known Activations