INDEX
    Explanations

    punctuation and formatting elements within the text

    New Auto-Interp
    Negative Logits
     již
    -0.67
     Doch
    -0.66
    omiast
    -0.65
     Jednak
    -0.64
     lecz
    -0.60
    ,’’
    -0.59
     deoarece
    -0.59
     אשר
    -0.59
     Porém
    -0.58
     doch
    -0.57
    POSITIVE LOGITS
     FUCKING
    0.97
     fucking
    0.93
    basically
    0.90
     Basically
    0.90
     pretty
    0.90
     goddamn
    0.88
     REALLY
    0.86
    pretty
    0.85
     basically
    0.85
    Basically
    0.83
    Act Density 0.415%

    No Known Activations