INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     interests
    -1.25
    interests
    -1.12
     Interests
    -1.02
    Interests
    -0.97
     Interest
    -0.89
    interest
    -0.89
     interest
    -0.89
     interesses
    -0.88
     Interessen
    -0.88
     intereses
    -0.88
    POSITIVE LOGITS
     st
    0.49
    AttributeSet
    0.46
    evich
    0.45
    0.45
     step
    0.45
     white
    0.44
     mid
    0.43
     am
    0.43
     ic
    0.43
    Step
    0.43
    Act Density 0.004%

    No Known Activations