INDEX
    Explanations

    references to comments or commentary in discussions

    New Auto-Interp
    Negative Logits
    combe
    -0.16
    pel
    -0.15
    emouth
    -0.15
    ouz
    -0.15
    ning
    -0.14
    iber
    -0.14
    yon
    -0.14
    sey
    -0.14
    aln
    -0.14
     approximation
    -0.14
    POSITIVE LOGITS
    aries
    0.31
    aires
    0.24
    ary
    0.23
    eting
    0.22
    ariat
    0.21
    ators
    0.19
    ers
    0.18
    ative
    0.18
    atory
    0.17
    aire
    0.17
    Act Density 0.030%

    No Known Activations