INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     maraming
    0.75
     больш
    0.74
    ूठी
    0.74
     wholeheartedly
    0.68
     bieden
    0.67
     আরও
    0.67
     eenvoudig
    0.66
     concludes
    0.66
     Ganz
    0.65
     évidence
    0.65
    POSITIVE LOGITS
     و
    0.83
     и
    0.82
    0.82
     ਅਤੇ
    0.79
    などの
    0.79
    0.71
     AND
    0.71
     and
    0.70
    т
    0.70
    ת
    0.70
    Act Density 0.003%

    No Known Activations