INDEX
    Explanations

    lists with punctuation and emojis

    New Auto-Interp
    Negative Logits
    Isn
    0.41
    0.39
     ı
    0.37
    rer
    0.36
    ///
    0.35
     h
    0.35
    ıt
    0.35
    ill
    0.34
     sia
    0.34
    ń
    0.34
    POSITIVE LOGITS
     ampere
    0.89
     adenine
    0.87
     furthermore
    0.75
    Dieser
    0.73
     additionally
    0.73
    <0xF1>
    0.72
     entsprechend
    0.70
     Hauptstadt
    0.70
     unmittel
    0.70
     aforementioned
    0.68
    Act Density 0.001%

    No Known Activations