INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    m
    2.72
    ل
    2.41
    ва
    2.16
    speople
    2.03
    1.98
    g
    1.97
    ר
    1.92
    an
    1.86
    ن
    1.84
    t
    1.81
    POSITIVE LOGITS
    Olá
    1.98
     टूर्
    1.83
    うま
    1.81
    1.80
    रोवर
    1.79
    1.79
     ilgili
    1.77
    τι
    1.73
    äischen
    1.73
    الج
    1.73
    Act Density 0.977%

    No Known Activations