INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.08
     is
    0.88
     on
    0.85
    0.84
    hed
    0.79
    с
    0.78
     that
    0.77
    was
    0.76
     of
    0.75
     to
    0.75
    POSITIVE LOGITS
    ב
    0.96
    ம்
    0.94
    0
    0.93
    ە
    0.93
    ם
    0.88
    போது
    0.87
    0.86
    b
    0.86
    ۰
    0.86
     iniciativa
    0.83
    Act Density 0.023%

    No Known Activations