INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <0x80>
    0.55
    ع
    0.50
     grossesse
    0.50
     faut
    0.50
     decimal
    0.49
    Ok
    0.49
     délic
    0.49
    𝐞
    0.49
    0.48
     yung
    0.48
    POSITIVE LOGITS
    libc
    0.61
     inventing
    0.56
     humanity
    0.54
    onents
    0.54
     humankind
    0.53
    osit
    0.53
    faceted
    0.53
     hordes
    0.52
     inhomogeneities
    0.52
    0.51
    Act Density 0.096%

    No Known Activations