INDEX
    Explanations

    special characters and punctuation marks

    New Auto-Interp
    Negative Logits
    ſelves
    -1.05
    BibitemShut
    -1.00
     iſt
    -0.97
     Anſ
    -0.97
     Efq
    -0.93
    ſelf
    -0.90
     Majefty
    -0.90
     XNUMX
    -0.90
     raiſ
    -0.88
     Arhivirano
    -0.84
    POSITIVE LOGITS
    ;
    0.68
    t
    0.63
    [toxicity=0]
    0.61
    racene
    0.59
    !
    0.58
    ٰ
    0.57
    s
    0.55
    ://
    0.54
    €“
    0.54
    \|
    0.52
    Act Density 0.019%

    No Known Activations