INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ect
    -0.16
    bah
    -0.15
    arez
    -0.15
    Higher
    -0.15
    eker
    -0.14
    emma
    -0.14
    URRED
    -0.14
    ther
    -0.14
    higher
    -0.14
    ocker
    -0.14
    POSITIVE LOGITS
     old
    0.71
    -old
    0.59
    old
    0.52
     olds
    0.52
    .old
    0.46
     OLD
    0.42
    olds
    0.40
    (old
    0.40
    _old
    0.40
     Old
    0.40
    Act Density 0.032%

    No Known Activations