INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     themſelves
    -0.89
     myſelf
    -0.88
     Jefus
    -0.88
     الرياضيه
    -0.86
     Majefty
    -0.83
     ſeveral
    -0.83
     Houſe
    -0.82
     itſelf
    -0.82
     becauſe
    -0.82
     himſelf
    -0.81
    POSITIVE LOGITS
    ise
    0.45
    i
    0.43
    ist
    0.42
    is
    0.42
    ish
    0.41
    ix
    0.40
     lives
    0.40
    item
    0.38
    er
    0.37
    ew
    0.37
    Act Density 0.392%

    No Known Activations