INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -0.88
     itſelf
    -0.78
     themſelves
    -0.77
     colored
    -0.76
     Efq
    -0.75
     ―――――
    -0.72
     himſelf
    -0.71
     foon
    -0.69
     Hift
    -0.69
     ſever
    -0.68
    POSITIVE LOGITS
    saraba
    0.55
    pters
    0.53
     as
    0.51
    +#+#
    0.47
    Personendaten
    0.47
    gonic
    0.47
    by
    0.46
    rophobic
    0.45
     crossorigin
    0.44
    加坡
    0.43
    Act Density 1.733%

    No Known Activations