INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    APs
    0.46
    outed
    0.44
    shifted
    0.43
    varlak
    0.42
    Ob
    0.42
    ž
    0.42
    insel
    0.42
    changed
    0.41
    NIH
    0.41
    RE
    0.41
    POSITIVE LOGITS
     communion
    0.49
    зион
    0.45
    ственным
    0.45
     folder
    0.44
    <0xEC>
    0.44
     разум
    0.44
     ма
    0.44
     rhythm
    0.44
     patriotic
    0.44
     Маке
    0.43
    Act Density 0.005%

    No Known Activations