INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    bruch
    1.39
     Platz
    1.38
    affee
    1.36
     पड़ता
    1.35
    rón
    1.33
     ክፍል
    1.33
     sucedido
    1.33
    okon
    1.32
    ného
    1.31
     exceptionnelle
    1.31
    POSITIVE LOGITS
    <bos>
    1.86
     similarities
    1.70
     люди
    1.69
     few
    1.67
    いくつ
    1.62
     criminals
    1.62
    兩個
    1.57
     skills
    1.57
     عوامل
    1.57
     tools
    1.55
    Act Density 0.450%

    No Known Activations