INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     unſer
    -0.73
    <unused79>
    -0.71
    <unused41>
    -0.71
     ſeyn
    -0.71
    <unused23>
    -0.71
    <unused14>
    -0.71
    <unused52>
    -0.71
    <unused74>
    -0.71
    [@BOS@]
    -0.70
    <unused1>
    -0.70
    POSITIVE LOGITS
    -
    0.61
     P
    0.44
    paddingBottom
    0.40
    _
    0.40
     V
    0.39
     p
    0.39
     C
    0.39
    0.39
    ug
    0.38
     Sha
    0.38
    Act Density 0.001%

    No Known Activations