INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    طان
    -0.06
    xab
    -0.06
    Bal
    -0.06
     TestCase
    -0.06
    -0.06
     seal
    -0.06
    xaa
    -0.06
     أج
    -0.06
    	case
    -0.06
    ’y
    -0.06
    POSITIVE LOGITS
     linguistic
    0.09
    istical
    0.08
    istics
    0.07
     Ludwig
    0.07
    etric
    0.07
     lingu
    0.07
     перш
    0.07
    фров
    0.07
     Lingu
    0.07
     оди
    0.06
    Act Density 0.007%

    No Known Activations