INDEX
    Explanations

    Smallness/unimportance

    New Auto-Interp
    Negative Logits
    	dist
    -0.07
    Sequence
    -0.07
     گ
    -0.07
    -0.06
    alth
    -0.06
     courteous
    -0.06
    qh
    -0.06
     Lebanon
    -0.06
    Higher
    -0.06
    -0.06
    POSITIVE LOGITS
    lady
    0.07
     assim
    0.07
    brit
    0.06
     allowable
    0.06
    -life
    0.06
    قيق
    0.06
     VERIFY
    0.06
    linik
    0.06
    λικά
    0.06
    Κ
    0.06
    Act Density 0.038%

    No Known Activations