INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     insets
    -0.10
     imperson
    -0.09
    afone
    -0.09
    329
    -0.09
    erse
    -0.09
     Tam
    -0.09
    gebn
    -0.08
    inois
    -0.08
    ufe
    -0.08
     inne
    -0.08
    POSITIVE LOGITS
     letters
    0.27
     Letters
    0.22
    Letters
    0.21
     choices
    0.21
    letters
    0.21
     options
    0.17
    choices
    0.15
     letter
    0.14
     Choices
    0.14
     choice
    0.14
    Act Density 0.035%

    No Known Activations