INDEX
    Explanations

    Tokens before periods or special characters

    specific suffixes and prefixes

    New Auto-Interp
    Negative Logits
    \<^
    -1.32
    ratulations
    -1.31
     Roskov
    -1.27
    leſs
    -1.26
    >\<^
    -1.26
     itſelf
    -1.24
    ^(@)
    -1.24
    intenance
    -1.24
    elfare
    -1.22
    litude
    -1.21
    POSITIVE LOGITS
    k
    0.88
    us
    0.85
    il
    0.81
    .
    0.80
    es
    0.80
    ss
    0.79
    v
    0.79
    um
    0.78
    te
    0.76
    g
    0.76
    Act Density 0.838%

    No Known Activations