INDEX
    Explanations

    expressions of apology or regret

    New Auto-Interp
    Negative Logits
    alo
    -0.15
    alli
    -0.15
     possibilities
    -0.15
    irk
    -0.15
    elib
    -0.14
    ini
    -0.14
    ets
    -0.14
    ali
    -0.14
    stadt
    -0.14
     extrav
    -0.13
    POSITIVE LOGITS
    kus
    0.19
    813
    0.19
     about
    0.16
    apat
    0.16
     meant
    0.15
    éĮĦ
    0.15
    /not
    0.15
    isser
    0.15
    ably
    0.15
     for
    0.15
    Act Density 0.021%

    No Known Activations