INDEX
    Explanations

    qualifying words used during mathematical reasoning.

    New Auto-Interp
    Negative Logits
    iyorum
    -0.07
    isko
    -0.06
    ıyorum
    -0.06
    ãĥ«ãĤ¯
    -0.06
     ÏĢÏģÎŃÏĢει
    -0.06
    rire
    -0.06
    .hasMore
    -0.06
     دارÙħ
    -0.06
    ä¸įåı¯
    -0.05
    (can
    -0.05
    POSITIVE LOGITS
     would
    0.45
    would
    0.38
     Would
    0.36
    Would
    0.35
     wouldn
    0.29
     zou
    0.24
     skulle
    0.24
     würde
    0.23
     serait
    0.23
     Wouldn
    0.22
    Act Density 0.356%

    No Known Activations