INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sn
    -0.07
     Gary
    -0.07
     действ
    -0.07
     прод
    -0.07
    бол
    -0.07
    .Param
    -0.07
    -0.07
    -0.07
    мож
    -0.07
     ат
    -0.07
    POSITIVE LOGITS
     electrom
    0.09
     geometr
    0.08
     plaus
    0.08
    riu
    0.08
     seal
    0.07
     Alger
    0.07
    guess
    0.07
     yal
    0.07
     hence
    0.07
     locom
    0.07
    Act Density 0.028%

    No Known Activations