INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    1.80
    LE
    1.41
    d
    1.24
    the
    1.17
    		
    1.13
    that
    1.10
    3
    1.09
    Vitamin
    1.06
    Q
    1.06
    tio
    1.05
    POSITIVE LOGITS
    ي
    1.74
    ла
    1.37
    ن
    1.36
    на
    1.33
    ينا
    1.27
    n
    1.22
    1.21
    ль
    1.20
    1.17
    н
    1.16
    Act Density 0.012%

    No Known Activations