INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    a
    1.66
    il
    1.45
     a
    1.44
    at
    1.21
    y
    1.21
    it
    1.10
    al
    1.09
    in
    1.02
    ur
    1.02
    ant
    0.98
    POSITIVE LOGITS
     powers
    1.11
    Powers
    1.09
    powers
    0.89
     дов
    0.88
     Powers
    0.84
     stesse
    0.78
     другие
    0.77
     полномо
    0.76
     институт
    0.75
     της
    0.75
    Act Density 0.006%

    No Known Activations