INDEX
    Explanations

    special characters and lists

    New Auto-Interp
    Negative Logits
     I
    0.57
    них
    0.49
    ية
    0.49
    ுள்ளார்
    0.48
     on
    0.47
    0.46
    ILE
    0.45
    ifik
    0.45
    ING
    0.44
     prestazioni
    0.44
    POSITIVE LOGITS
    u
    0.71
    ის
    0.70
    0.70
    ى
    0.69
    and
    0.63
    0.61
    ين
    0.61
    b
    0.59
    p
    0.59
    ل
    0.59
    Act Density 2.690%

    No Known Activations