INDEX
    Explanations

    a variety of function words and indicators of grammatical structure

    New Auto-Interp
    Negative Logits
    arrera
    -0.17
     ÙĨب
    -0.15
    æ°ı
    -0.15
    ccion
    -0.14
     horns
    -0.14
    esk
    -0.14
    antage
    -0.14
    arius
    -0.14
    ivative
    -0.13
    guide
    -0.13
    POSITIVE LOGITS
     Dün
    0.17
     tez
    0.14
     Dum
    0.14
    _lite
    0.13
    dum
    0.13
     Trinidad
    0.13
     ancestral
    0.13
    ØŃاد
    0.13
    icontrol
    0.13
    oller
    0.13
    Act Density 0.026%

    No Known Activations