INDEX
    Explanations

    fractions/proportions

    New Auto-Interp
    Negative Logits
    IONS
    -0.30
    STS
    -0.28
    OPS
    -0.26
    DED
    -0.26
    sts
    -0.26
     OPS
    -0.25
    KP
    -0.25
     KP
    -0.25
    âľį
    -0.25
    kp
    -0.25
    POSITIVE LOGITS
    对ä»ĸ们
    0.25
    æģ©
    0.25
    IMATION
    0.24
     пеÑĩ
    0.24
    brew
    0.24
    citation
    0.24
    åıĹçĽĬ
    0.24
    ayer
    0.24
    duino
    0.24
    è¾Ľèĭ¦
    0.24
    Act Density 0.011%

    No Known Activations