INDEX
    Explanations

    phrases indicating degrees of change or intensity

    New Auto-Interp
    Negative Logits
    s
    -0.16
    lin
    -0.15
    nt
    -0.14
    à¸Ńะ
    -0.14
    umer
    -0.14
    нова
    -0.13
    orting
    -0.13
    istar
    -0.13
    CRET
    -0.13
    ças
    -0.13
    POSITIVE LOGITS
    quier
    0.17
    ìĶ©
    0.17
    /stdc
    0.15
    leton
    0.15
    许
    0.15
     CDDL
    0.14
    -ÑĤаки
    0.14
    /all
    0.14
    룬
    0.14
     bit
    0.14
    Act Density 0.045%

    No Known Activations