INDEX
    Explanations

    questions or references to people

    New Auto-Interp
    Negative Logits
    robat
    -0.20
    bian
    -0.17
    aises
    -0.17
    ted
    -0.16
    bol
    -0.16
    wr
    -0.16
    алеж
    -0.15
    nt
    -0.15
     Darling
    -0.15
    pure
    -0.15
    POSITIVE LOGITS
    ever
    0.33
    ops
    0.33
     else
    0.29
    opi
    0.29
    oping
    0.28
    osh
    0.28
    op
    0.25
    a
    0.24
     am
    0.24
     amongst
    0.23
    Act Density 0.023%

    No Known Activations