INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     matt
    -0.08
    обр
    -0.07
     masc
    -0.07
     arisen
    -0.07
    .zero
    -0.07
     вой
    -0.07
     baca
    -0.07
     diagn
    -0.07
     mosa
    -0.07
    -0.07
    POSITIVE LOGITS
     verhe
    0.08
     Technologie
    0.08
     Reck
    0.08
     Diesel
    0.08
     نسبة
    0.08
     Richardson
    0.08
    -relative
    0.08
    0.08
    ::*;↵
    0.08
    akuwa
    0.07
    Act Density 0.001%

    No Known Activations