INDEX
    Explanations

    mathematical expressions and norms in a document

    New Auto-Interp
    Negative Logits
    stra
    -0.17
    ½
    -0.16
     Sawyer
    -0.15
    iste
    -0.15
    empo
    -0.15
    ÏģιÏĥ
    -0.14
     Lucas
    -0.14
    دÙĪØ¯
    -0.14
    ¡
    -0.14
    dra
    -0.14
    POSITIVE LOGITS
    MI
    0.16
    vit
    0.15
    éģİ
    0.15
     fiat
    0.15
     («
    0.14
    átka
    0.14
    ục
    0.14
    mgr
    0.14
     ÑħÑĥд
    0.14
    cznie
    0.14
    Act Density 0.063%

    No Known Activations