INDEX
    Explanations

    sh followed by common word endings

    New Auto-Interp
    Negative Logits
    an
    2.83
    ли
    2.61
    er
    2.30
    theless
    2.25
    ofthe
    2.23
    ש
    2.22
    н
    2.17
    2.13
    2.06
    1.95
    POSITIVE LOGITS
    ের
    2.11
    OREM
    1.96
    িম
    1.84
    uggling
    1.84
     rappel
    1.74
     peers
    1.72
    НА
    1.71
    EEP
    1.69
     reps
    1.67
    िन
    1.65
    Act Density 0.147%

    No Known Activations