INDEX
    Explanations

    numerical identifiers or codes

    New Auto-Interp
    Negative Logits
    ships
    -0.19
    ship
    -0.18
    iem
    -0.17
    pill
    -0.17
    orio
    -0.17
    views
    -0.17
    liness
    -0.16
    table
    -0.16
    vals
    -0.15
    iw
    -0.15
    POSITIVE LOGITS
    ughter
    0.19
    eker
    0.17
    ulously
    0.16
    entially
    0.16
       
    0.15
    ugh
    0.15
    emp
    0.15
    ³³³³³
    0.15
    __("
    0.15
    ity
    0.14
    Act Density 0.102%

    No Known Activations