INDEX
    Explanations

    punctuation and common words

    New Auto-Interp
    Negative Logits
    ,
    1.11
    .
    0.68
    ר
    0.62
    ق
    0.61
    ان
    0.59
    0.59
    í
    0.57
    ل
    0.55
    cular
    0.54
    er
    0.53
    POSITIVE LOGITS
     valamint
    0.82
     odnosno
    0.77
     illetve
    0.76
     जबकि
    0.74
     czyli
    0.73
     który
    0.68
     hanno
    0.67
     sogenannte
    0.66
     takže
    0.64
     اتارنا
    0.63
    Act Density 0.250%

    No Known Activations