INDEX
    Explanations

    terms related to various types of "yes" or affirmative expressions

    New Auto-Interp
    Negative Logits
    efe
    -0.20
    aland
    -0.16
    affen
    -0.15
    brick
    -0.15
    olly
    -0.15
    mmo
    -0.15
    pering
    -0.15
    uten
    -0.14
    otland
    -0.14
    PERT
    -0.14
    POSITIVE LOGITS
    isseur
    0.21
    ymous
    0.20
    oit
    0.18
    ises
    0.18
    xious
    0.17
    elle
    0.17
    ise
    0.16
     Longer
    0.15
    iu
    0.15
     longer
    0.15
    Act Density 0.027%

    No Known Activations