INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    an
    1.08
    ط
    1.07
     Embora
    1.05
     proteína
    1.04
     separado
    1.03
     aberta
    1.03
     embora
    1.02
    わからない
    1.01
     appellees
    1.00
     residuos
    0.99
    POSITIVE LOGITS
    0.93
    ંગ
    0.87
    ی
    0.87
    ‍♀️
    0.86
     vei
    0.80
    nout
    0.79
    ,
    0.79
    0.78
     Nav
    0.77
    nous
    0.76
    Act Density 0.003%

    No Known Activations