INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     shove
    -0.69
     swipe
    -0.67
    Rub
    -0.63
     rese
    -0.60
    pmwiki
    -0.59
     smack
    -0.58
    entimes
    -0.58
     scratch
    -0.58
    NRS
    -0.58
     jammed
    -0.57
    POSITIVE LOGITS
    ĸļ
    0.80
    ifer
    0.73
    aido
    0.71
    rer
    0.69
    frames
    0.69
    agos
    0.68
    bear
    0.68
    assic
    0.68
    efficients
    0.66
    onel
    0.66
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.