INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    yip
    -0.72
    utterstock
    -0.65
    itiveness
    -0.57
    hill
    -0.56
    manship
    -0.55
    ibaba
    -0.54
    Kings
    -0.52
    spot
    -0.52
    shit
    -0.52
    VALUE
    -0.51
    POSITIVE LOGITS
    ace
    0.66
    astern
    0.65
    incial
    0.62
    airo
    0.61
    adan
    0.60
    rane
    0.60
    emporary
    0.58
    antes
    0.57
     Norn
    0.56
    ãĥī
    0.55
    Act Density 0.125%

    No Known Activations