INDEX
    Explanations

    punctuation

    New Auto-Interp
    Negative Logits
     sister
    -0.08
     suc
    -0.08
    -0.08
     Genève
    -0.07
     кандидат
    -0.07
     Victoria
    -0.07
     Paula
    -0.07
     dele
    -0.07
     sério
    -0.07
    -0.07
    POSITIVE LOGITS
     adv
    0.08
    Straight
    0.08
     નો
    0.07
     નું
    0.07
     Straight
    0.07
    🏼
    0.07
     ਵਰ
    0.07
     প্রতি
    0.07
    Arthur
    0.07
    0.07
    Act Density 0.013%

    No Known Activations