INDEX
    Explanations

    measurements and calculations

    New Auto-Interp
    Negative Logits
     Kra
    -0.08
    🏼
    -0.08
     demonstr
    -0.08
     Kamp
    -0.08
     Presidency
    -0.08
     Tex
    -0.07
    ږ
    -0.07
    stag
    -0.07
    leda
    -0.07
    /ge
    -0.07
    POSITIVE LOGITS
     lap
    0.08
    Hmm
    0.07
     மண
    0.07
    lap
    0.07
    Lit
    0.07
     dioxide
    0.07
     lesbians
    0.07
    ural
    0.07
     Mile
    0.07
    0.07
    Act Density 0.056%

    No Known Activations