INDEX
    Explanations

    digits and numbers

    New Auto-Interp
    Negative Logits
     feme
    -0.08
     dining
    -0.07
    Bell
    -0.07
    Kategorie
    -0.07
     Dining
    -0.07
     Moore
    -0.07
    handlungen
    -0.07
     compl
    -0.07
     élég
    -0.07
     kvinn
    -0.07
    POSITIVE LOGITS
     screening
    0.08
     постепенно
    0.08
     যৌ
    0.08
    ortic
    0.08
     afore
    0.08
    (screen
    0.08
     সো
    0.07
     раск
    0.07
     দুর
    0.07
     vajad
    0.07
    Act Density 0.029%

    No Known Activations