INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     subt
    -0.09
    recht
    -0.09
    -fe
    -0.08
    .Mock
    -0.08
    .FE
    -0.08
     Влад
    -0.08
     foot
    -0.08
     net
    -0.08
    _features
    -0.07
     Ethics
    -0.07
    POSITIVE LOGITS
     Wis
    0.09
    THER
    0.08
     MEMORY
    0.08
    0.08
    kiss
    0.08
    сці
    0.07
     Perc
    0.07
    romise
    0.07
    絲襪
    0.07
    isoft
    0.07
    Act Density 0.001%

    No Known Activations