INDEX
    Explanations

    articles or determiners in various contexts

    New Auto-Interp
    Negative Logits
    Germain
    -0.83
    theless
    -0.79
     Disqus
    -0.71
    ization
    -0.71
     Mahomet
    -0.69
     iſt
    -0.69
    jména
    -0.69
    -0.66
     مشين
    -0.66
     impar
    -0.65
    POSITIVE LOGITS
    Σε
    1.12
     à
    1.00
     BorderRadius
    0.96
     the
    0.89
     σε
    0.88
    Về
    0.86
    日在
    0.85
    }}]{
    0.85
    ]<<
    0.84
    ()]
    
    0.83
    Act Density 0.023%

    No Known Activations