INDEX
    Explanations

    references to specific authors and research citations

    New Auto-Interp
    Negative Logits
    eyse
    -0.16
    emos
    -0.14
    illiseconds
    -0.13
    ivre
    -0.13
    iid
    -0.13
    yna
    -0.13
    EMU
    -0.13
    otropic
    -0.13
    idot
    -0.13
    awan
    -0.13
    POSITIVE LOGITS
    201
    0.14
     Dice
    0.14
     etc
    0.13
     hav
    0.13
     gent
    0.13
     Hav
    0.13
     Chest
    0.13
    199
    0.13
     Bd
    0.12
    194
    0.12
    Act Density 0.023%

    No Known Activations