INDEX
    Explanations

    numerical references, likely related to citations or statistics in academic texts

    New Auto-Interp
    Negative Logits
    zen
    -0.15
    cus
    -0.15
    ÙĨا
    -0.15
    utzer
    -0.15
    ataka
    -0.15
    ĥ½
    -0.14
    enary
    -0.14
    å¿ľ
    -0.14
     Morrow
    -0.14
    ared
    -0.14
    POSITIVE LOGITS
    ff
    0.24
    _ff
    0.16
     ff
    0.16
     n
    0.15
    æ´ŀ
    0.15
     bottoms
    0.15
    fff
    0.14
    foot
    0.14
     bottom
    0.14
    ottom
    0.14
    Act Density 0.068%

    No Known Activations