INDEX
    Explanations

    terms associated with simplicity, basicness, or a lack of sophistication

    New Auto-Interp
    Negative Logits
    еÑĩ
    -0.16
    agate
    -0.14
    thew
    -0.14
    رز
    -0.13
    ean
    -0.13
    aku
    -0.13
    ordova
    -0.13
    iams
    -0.13
    ings
    -0.13
    帯
    -0.13
    POSITIVE LOGITS
    /simple
    0.15
    uder
    0.15
    /raw
    0.15
    caller
    0.15
    eti
    0.14
    /big
    0.14
     ترÛĮÙĨ
    0.14
    117
    0.14
    inte
    0.14
    /original
    0.13
    Act Density 0.026%

    No Known Activations