INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    с
    1.24
    ви
    1.23
    0.99
    س
    0.96
    0.92
    па
    0.91
    тся
    0.91
     в
    0.89
    З
    0.89
    ες
    0.86
    POSITIVE LOGITS
    ↵↵
    1.40
     sacrifice
    1.38
     sacrifices
    1.19
    1.13
     Sacrifice
    1.12
    el
    1.09
    y
    1.07
    ro
    1.05
    ed
    1.05
     sacrificed
    1.05
    Act Density 0.016%

    No Known Activations