INDEX
    Explanations

    Spanish/French definite articles

    New Auto-Interp
    Negative Logits
    1
    2.06
    2
    1.98
    9
    1.95
    들이
    1.91
    7
    1.85
    5
    1.80
    8
    1.79
    3
    1.74
    4
    1.73
    offic
    1.72
    POSITIVE LOGITS
    1.99
    มี
    1.92
    1.73
     Desarrollo
    1.66
    1.66
    ิน
    1.61
    و
    1.60
    ڙ
    1.55
    urile
    1.50
    𝘔
    1.49
    Act Density 0.081%

    No Known Activations